mapreduce with non-existent keys

Bryan Fink bryan at basho.com
Thu Aug 23 09:32:06 EDT 2012


Wow, this question slipped by while I wasn't looking. Sorry about that.

On Mon, Jul 16, 2012 at 4:47 PM, Mark Boyd ソフトウェア 建築家
<BoydMR at ldschurch.org> wrote:
> Can anyone familiar with the innards of riak describe how distribution of a
> map/reduce is handled when there are multiple reduce phases included as in
> this solution copied from below. I’m assuming that the first map phase would
> spread to nodes containing data for incoming bucket/key combinations and
> their output pulled back to the coordinating node for the first reduce
> phase. Then the second map phase would spread to (potentially different)
> nodes containing data for that phase’s incoming bucket/key combinations and
> their output pulled back to the coordinating node for the final reduce
> phase.

Exactly correct, Mark. Map is always spread to vnodes holding the
objects to be read/transformed. Reduce is always brought back to a
single node for aggregation. So you would have a
scatter-gather-scatter-gather pattern, just as you described.

Javascript map is quite limited in its handling of errors, as you
found. Erlang map phases, however, get an opportunity to handle the
notfound themselves. An example of what this looks like can be found
in the riak_kv_mapreduce:map_object_value/3 function:

https://github.com/basho/riak_kv/blob/master/src/riak_kv_mapreduce.erl#L81-99

So, if you find the intermediate aggregation of that filtering reduce
to be a problem, you could consider migrating to Erlang for your map
phase.

-Bryan




More information about the riak-users mailing list