distribution of data among riak cluster physical nodes
jdaily at basho.com
Wed Oct 9 10:01:55 EDT 2013
What you saw is actually quite typical when using R=1.
When you turn off a node, Riak's sloppy quorum behavior will kick in and hand off some of your read requests to an empty data partition to supplement the two remaining partitions that still have a copy. Since the partition is empty, it can respond very quickly with a message indicating that the object cannot be found.
By default, Riak will treat the absence of an object as a definitive statement of "we don't have a copy of that."
If R=2 or R=3, Riak would wait for another partition to respond before replying, but since R=1 the first reply wins. You should find that a 2nd request succeeds, because read repair will kick in and distribute a copy of the object to that empty partition.
Rather than increasing the R value, you can also manipulate this behavior using the notfound_ok configuration variable.
See http://basho.com/riaks-config-behaviors-part-3/ for a more detailed explanation, specifically the "notfound tuning" section, and http://basho.com/riaks-config-behaviors-epilogue/ for a list of links to the full series.
On Oct 9, 2013, at 9:50 AM, kzhang <kzhang at wayfair.com> wrote:
> We have a 5 node riak cluster to store site images, with N=3, R=1. When we
> turned off one node, a lot of GET requests failed, which made me think those
> requested images (3 copies of them) all landed on the failed physical node.
> Is there a way to tell how the replicas are distributed among the physical
> nodes, and if there is a way to re-distribute them if all copies of the same
> key-value are on the same machine?
> View this message in context: http://riak-users.197444.n3.nabble.com/distribution-of-data-among-riak-cluster-physical-nodes-tp4029398.html
> Sent from the Riak Users mailing list archive at Nabble.com.
> riak-users mailing list
> riak-users at lists.basho.com
More information about the riak-users