Understanding Riaks rebalancing and handoff behaviour

Scott Lystig Fritchie slfritchie at snookles.com
Thu Nov 11 17:17:30 EST 2010


Nico Meyer <nico.meyer at adition.com> wrote:

nm> I discovered another problem while debugging this. I you restart (or
nm> it crashes) a node that you removed from the cluster which still has
nm> data, it won't start handing off it's data afterwards. The reason
nm> being, that is the node watcher also does not get notified that the
nm> other nodes are up, and so all of them are considered down. This
nm> also can only be worked around manually via the erlang console.

Nico, I've opened ticket 878 after scripting your scenario and
duplicating it on an Ubuntu9 32-bit box using the Riak package
riak_0.13.0-2_i386.deb.

    https://issues.basho.com/show_bug.cgi?id=878

On to Sven's problem that started this thread ... I've a larger script
that attempts to reproduce his problem, using 12 nodes installed on a
single Ubuntu9 32-bit machine (though reading carefully, Sven doesn't
get around to using EC2 instance number D, so only 9 nodes are used).

I have the script and output available at
http://www.snookles.com/scotttmp/riedel-scenario.tar.gz.  Sorry, I don't
have the rest of the basho_expect infrastructure available to outside
users right now(*), so it isn't possible for outsiders to re-run the
test, but it should show what's being done at a high level (the Python
script) and the detailed output (the other file, search for the regexp
"\*\*" for major section headings).

Sven, if I've made a major mistake on the script, please let me know
outside of the mailing list.  I'll try to fix the script and, if
necessary, open another Bugzilla ticket.

-Scott

(*) Releasing basho_expect with a reasonable open source license is on
the Basho todo list.




More information about the riak-users mailing list