Open ticket for configurable R-value in MapReduce?

Mark Phillips mark at basho.com
Wed Dec 14 15:58:41 EST 2011


Hi Elias,

On Mon, Dec 12, 2011 at 9:52 AM, Elias Levy <fearsome.lucidity at gmail.com> wrote:
> I went through the bug database and could not find any open ticket for
> having a configurable r-value in mapreduce.  Is there one that someone knows
> of?
>

There is nothing currently open for this afaik. That said, ways of
doing/implementing it were discussed recently on the list. [1]

> It would seem like this is a major limitation of the system.  Currently MR
> works in a way that essentially results in an R-value of 1.  That makes MR
> unreliable if you loose a node or add new nodes to your cluster.  This is
> particularly painful, as MR is often used in lieu for a bulk fetch API, or
> when combined with Search or 2i to remove the additional round trip time
> that would be required without it.
>

Hmmm. Perhaps I'm not following here, but I don't see how R=1 on M/R
would make it unreliable in the face of node failure or node addition.
Assuming you have at least three nodes (the Basho Recommended
Minimum™) and standard defaults, you should have three copies of any
key you're trying to use in a M/R job. If you lose one node (or one
copy), you can still satisfy the R=1 requirement.

> We'd like to double the size of our cluster, but without dumping all of the
> data and reloading it after we'd added the the new nodes, which would take
> far too long even with the new nodes (bulk load API anyone?), this does not
> seem feasible.  It would result in 50% found found errors.  Even adding a
> single node seems unacceptable.
>

We did some major work in Riak 1.x to ensure that node additions
didn't result in 404's, and riak_pipe does some work to ensure that
jobs are processed, too. More specifically:

* riak_core will compute a new claim when a node is added, and then
periodically ask a vnode to start handing off data as appropriate.
Vnodes can choose not to handoff given its current state. In any case,
whenever all vnode modules associated with an index (kv, pipe, search,
etc) finish handoff, that's when the ownership is actually
transferred. Otherwise, the prior owner remains in effect. Thus, a new
node is not reflected as the owner of a partition until after it has
all data associated with that partition.

* riak_pipe has built in protection for this, too. In short, pipe has
the smarts to ship jobs around to available vnodes when there is
handoff happening in the cluster. Pipe vnodes migrate the workers
they're running as soon as each one asks for its next input; current
state and all outstanding queued items are transfered at that time.
Pipe m/r fetches might still start happening on the new kv node before
that node has all its data, but if that occurs you would get local
notfounds, and retry on other vnodes in the preflist.  That said, if
all preflists have been shuffled, and all are mid-transfer for a given
key, then yes, you might ultimately end up not finding the value. And
if you have data on this happening, we would love to see it. :)

(h/t to Joseph Blomstedt and Bryan Fink for the explanations here.)

So, doubling the size of you cluster shouldn't require much more than
adding nodes via the normal methods. Spinning up a whole new cluster
would be overkill.

I'm curious - have you tested node addition? Where did that 50% error
number come from? Also, what version of Riak are you running?

Best.

Mark

[1] http://riak.markmail.org/search/?q=consistent%20map%2Freduce#query:consistent%20map%2Freduce+page:1+mid:sybkfhmlfidq2vj3+state:results


> Elias Levy
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



More information about the riak-users mailing list