A function as an input for map/reduce

Justin Sheehy justin at basho.com
Thu May 5 10:26:19 EDT 2011


Hi, Mikhail.

On Tue, May 3, 2011 at 5:55 PM, Mikhail Sobolev <mss at mawhrin.net> wrote:

>   Is there more information about "it can through a few keys at a time,
>   and the map/reduce chain would go ahead and start doing the
>   processing on whatever keys it gets as soon as it gets them, it does
>   not have to wait for the whole list of that function" (@ ~9:54 in the
>   video)?  What I'm concerned here is about a chain of
>   map/map/map/reduce/reduce phases.   How the processing is actually
>   performed?  What are the synchronization points?

The "map" part of the MapReduce programming paradigm is not only
inherently parallel, it also does not impose a point of order on the
overall dataflow and thus does not introduce a concurrency barrier.
In practical terms this means that individual data items can be
processed as soon as they arrive, and the results can be immediately
pushed on to the next phase of the overall job without waiting for all
other data to make it through the map.

The "reduce" part does not have this pleasant property, as that phase
is present in order to perform exactly the kinds of operations (such
as counting) that do require waiting.

-Justin




More information about the riak-users mailing list