Open ticket for configurable R-value in MapReduce?
fearsome.lucidity at gmail.com
Wed Dec 14 17:32:50 EST 2011
On Wed, Dec 14, 2011 at 12:58 PM, Mark Phillips <mark at basho.com> wrote:
> Hmmm. Perhaps I'm not following here, but I don't see how R=1 on M/R
> would make it unreliable in the face of node failure or node addition.
> Assuming you have at least three nodes (the Basho Recommended
> Minimum™) and standard defaults, you should have three copies of any
> key you're trying to use in a M/R job. If you lose one node (or one
> copy), you can still satisfy the R=1 requirement.
If you add a node, that node will be empty. If MR chooses the new node,
the choice of R=1 will cause it to think there is no data to process. As
time goes on that node will gain new data or be populated by read-repair,
but it will still not have a complete data set until either all previous
data has been read, updated, or deleted.
In the case of node failure, the sample applies if you have to bring up a
new empty node to replace the downed node.
I forgot that if a node goes down temporary other nodes will accept updates
on its behalf and hand them over when it comes back up, so that type of
failure is not of a concern.
We did some major work in Riak 1.x to ensure that node additions
> didn't result in 404's, and riak_pipe does some work to ensure that
> jobs are processed, too. More specifically:
> * riak_core will compute a new claim when a node is added, and then
> periodically ask a vnode to start handing off data as appropriate.
> Vnodes can choose not to handoff given its current state. In any case,
> whenever all vnode modules associated with an index (kv, pipe, search,
> etc) finish handoff, that's when the ownership is actually
> transferred. Otherwise, the prior owner remains in effect. Thus, a new
> node is not reflected as the owner of a partition until after it has
> all data associated with that partition.
Just to confirm, you are saying that existing KV and Search data will be
redistributed within a cluster when you add a new node?
If so, then that is great. I was under the impression that was not the
case. The only the vnode ownership transfered, and not the data from the
I'm curious - have you tested node addition? Where did that 50% error
> number come from? Also, what version of Riak are you running?
I have not. We are just planning for it at the moment. The 50% came just
from my (apparently half-assed) understanding of the state of rebalancing
within a cluster.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users