Open ticket for configurable R-value in MapReduce?

Elias Levy fearsome.lucidity at gmail.com
Mon Dec 12 12:52:55 EST 2011


I went through the bug database and could not find any open ticket for
having a configurable r-value in mapreduce.  Is there one that someone
knows of?

It would seem like this is a major limitation of the system.  Currently MR
works in a way that essentially results in an R-value of 1.  That makes MR
unreliable if you loose a node or add new nodes to your cluster.  This is
particularly painful, as MR is often used in lieu for a bulk fetch API, or
when combined with Search or 2i to remove the additional round trip time
that would be required without it.

We'd like to double the size of our cluster, but without dumping all of the
data and reloading it after we'd added the the new nodes, which would take
far too long even with the new nodes (bulk load API anyone?), this does not
seem feasible.  It would result in 50% found found errors.  Even adding a
single node seems unacceptable.

How are people handling this?  Can one use Riak EDS to mirror the data to
the new nodes set up as in a mirrored cluster, and once they are up to
data, add them to the production cluster?  Or is there a way to add a node
to a cluster in such way that it accepts data for storage but not for
querying, then have Riak EDS populate it, and then have it start accepting
reads?

Elias Levy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111212/2cd832e4/attachment.html>


More information about the riak-users mailing list