build 393: adding server results in missing data

Justin Sheehy justin at basho.com
Mon Nov 16 16:02:07 EST 2009


I just wanted to update the list on this topic since most of the
discussion on this topic has been in-person in the Basho offices.

We know exactly what the problem is now -- it involves the
overly-dynamic nature of the hash ring -- and are narrowing in on the
best solution.  That solution will include not only a bug-fix but also
a means for "fixing" clusters in this state.  We'll certainly post
here as soon as we can; that latter part is essential to current
production users and is the reason we're not just rolling out an
immediate fix to the core problem but instead brainstorming on the
best way to minimize pain in the process of accepting this upgrade.

A short note while we get there: this symptom only occurs visibly on
small clusters with proportionally large R values compared to the
number of nodes.  This solved the mystery for us of why we hadn't seen
it before in either test or production, as a two-node system with
queries N=3, R=3 (for instance) isn't anything you'd normally want to
use for real work.  In fact, while we work on the best solution, you
should see that reads with R <= (N - (number of recently added nodes))
should work fine.

(Also, even with a fix, an N value greater than the number of actual
nodes will never make much sense for anybody.)

More soon.

Best Regards,

-Justin




More information about the riak-users mailing list