A function as an input for map/reduce

Mikhail Sobolev mss at mawhrin.net
Thu May 5 17:15:58 EDT 2011


Hi Justin,

On Thu, May 05, 2011 at 10:26:19AM -0400, Justin Sheehy wrote:
> The "map" part of the MapReduce programming paradigm is not only
> inherently parallel, it also does not impose a point of order on the
> overall dataflow and thus does not introduce a concurrency barrier.
> In practical terms this means that individual data items can be
> processed as soon as they arrive, and the results can be immediately
> pushed on to the next phase of the overall job without waiting for all
> other data to make it through the map.
> 
> The "reduce" part does not have this pleasant property, as that phase
> is present in order to perform exactly the kinds of operations (such
> as counting) that do require waiting.
Thank you for the description.  I now wonder if it's possible for a
map-function instead of returning the whole list of results, do
something that Riak would take as "ah! another map result, let's do pass
it to the next phase"?  In other words, instead of something like

    my_map_function(Object, _, _) ->
        object_to_list_of_values(Object).

do something like

    my_map_function(Object, _, _) ->
        produce_one(Object).

    produce_one(Object) ->
        ...
        emit(....), % following CouchDB syntax
        ...
        produce_one(Object').

--
Misha
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110506/172f581a/attachment.asc>


More information about the riak-users mailing list