multi-get (yet again)

Bryan Fink bryan at basho.com
Fri Aug 10 08:52:32 EDT 2012


On Thu, Aug 9, 2012 at 5:11 AM, Kresten Krab Thorup <krab at trifork.com> wrote:
> The only issue with this approach is AFAIK that M/R effectively runs with R=1, i.e. it doesn't ensure that a value is consistent across replicas.
>
> IMHO riak_kv_mapreduce should have a map_get_object_value, which does a proper RiakClient:get, i.e. something like this: [will be slower, but will honour the bucket's default R value].

I recently realized that this would be a fairly small and easy thing
to do since MR has been ported to Riak Pipe. I'm frying other fish at
the moment, but if any of your are interested, read on.

In Riak Pipe, an MR "map" phase is broken into two steps: "get" and
"transform". The "get" phase is what reads the value from Riak. It is
currently implemented in riak_kv_pipe_get, in the riak_kv application.

If you read riak_kv_pipe_get.erl, you'll see that all of the fetching
logic is in the process/3 function. Modifying this code to do a
regular riak_client:get instead of talking directly to a single vnode
should be easy.

We would like to keep the existing implementation as the default, at
least for now. So, my suggestion would be to add the new behavior as
an option, with flags to control it. This could be accomplished either
by modifying riak_kv_pipe_get to look for a flag in its argument, or
by modifying riak_kv_mrc_pipe to use a new fitting instead of
riak_kv_pipe_get.

With either modification, you'll want to also change riak_kv_mrc_pipe
to pass the map arguments through to the "get" fitting. These
arguments are the only place available to external clients to specify
any of the R-value tuning parameters. Yes, that means a map function
implementation will have to ignore them, but hopefully that's not
insurmountable. See the reduce_batch_size and reduce_phase_only_1
optional "reduce" phase arguments for examples on how to do this.

There are probably other ways to fit this kind of fetching behavior in
as well. While Kresten's map-function implementation is good, I think
this behavior is useful in more cases than resolving a
notfound. Hopefully what I've written above is enough to get one or
more of you started down a path.

Cheers,
Bryan




More information about the riak-users mailing list