Reduce phase only on one node?
John D. Rowell
me at jdrowell.com
Fri Jul 23 17:34:38 EDT 2010
+1 to this, my understanding is that you can use the same reduce funcion to
re-reduce a stream of data and still get the same results. Is this what
actually happens in Riak internally (i.e. the coordinating node only
re-reduces each node's reduce) or does the reduce function only run on the
coordinating node, which receives the raw map data?
2010/7/23 John Butler <php_john at hotmail.com>
> I went through the Riak Fast Track and overall I'm excited about the
> possibilities of Riak. However, I did see one thing that concerns me in
> regard to MapReduce jobs. From the Riak docs and this page here:
> http://seancribbs.com/tech/2010/02/06/why-riak-should-power-your-next-rails-app/it seems to suggest that while Map jobs are performed in parallel Reduce
> jobs are not.
> Wouldn't be better if each node performed the reduce function on it's own
> set of data before sending it to the main coordinator? Imagine if you are
> mapping over a high number of objects (millions) and are trying to aggregate
> the data grouped by some demographic (such as state). If all the mappers
> send the data to coordinating node to be reduced then you are going to send
> millions of items from each data node all to one coordinating node. If
> instead each node called reduce first, then sent it to the coordinating
> node, then each node would be at most sending over 50 objects (for 50
> states). Not only would this greatly reduce network traffic, it would be
> able to execute in parallel greatly improving performance. Since all reduce
> jobs are communicative, associative and idempotent I see no reason why this
> couldn't happen (of course I'm not familiar with the Riak internals to
> strongly assert this).
> Maybe I misunderstood and this is already happening. If it's not happening,
> why and is that on the roadmap to be added later?
> Hotmail is redefining busy with tools for the New Busy. Get more from your
> riak-users mailing list
> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users