"consistent" map/reduce

Bryan Fink bryan at basho.com
Wed Nov 30 09:58:35 EST 2011


On Wed, Nov 30, 2011 at 8:36 AM, Kresten Krab Thorup
<krab at trifork.com> wrote:> 1. How do I create the initial inputs? i.e.
the list of all {Index, Node} pairs that go into the
riak_kv_pipe_listkeys fitting.  Does this fitting need a special
chashfun to send it to the right vnode?
The easiest way to get the "go" message to all vnodes is to use the
riak_pipe_qcover_* modules, as is done here:
https://github.com/basho/riak_kv/blob/master/src/riak_kv_pipe_listkeys.erl#L135
Most of your arguments will be the same as is seen there:    [{raw,
 %% used send (!) for done, not gen_*/luke/etc. calls      ReqId,   %%
a unique integer for the request      self()}, %% send done and errors
to this process     [LKP,     %% the pipe with listkeys as its head
fitting      Bucket,  %% the name of the bucket to list      NVal]]
%% the N to use in coverage calculation, 1 in your case
It would be difficult to do the same with a special chashfun, since
the input that listkeys expects does not contain anything naming the
vnode for your chashfun to work with (every vnode gets the same
input).
> 2. Given such a pipeline-based thingie installed as .beam files with Riak, is there a way to "invoke" it via the HTTP M/R API?   It would be great if I don't have to poke a new whole through tcp/ip to exploit pipe.
Sadly, no, there is not a way to use this special setup over the HTTP
MR API at this time.  I have been working on some designs for external
pipe control, but they are not yet ready for implementation.  So, yes,
you'll have to poke a new whole through for now.
> 3. Does riak_pipe run a fitting instance per node, or per vnode?
Pipe runs a fitting worker per vnode.  If you have ideas for how to
make that clearer in riak_pipe's README.org, please pass them along!
> FYI, ... I'm considering doing a disk-based version of riak_pipe_w_reduce, which keeps the intermediate results in a local K/V store, in order to support large keysets.    This could just be a bitcask w/merge disabled.   Re-implementing the reducer would also allow us to evaluate the N>R condition in the reducer, and emit results as early as possible.
Sounds awesome and right on track, I must say.
-Bryan




More information about the riak-users mailing list