Understanding Riaks rebalancing and handoff behaviour

Alexander Sicular siculars at gmail.com
Tue Nov 9 10:30:44 EST 2010

Mainly, I'm of the impression that you should join/leave a cluster one
node at a time.


On 2010-11-09, Sven Riedel <sven.riedel at scoreloop.com> wrote:
> Hi,
> I'm currently assessing how well riak fits our needs as a large scale data
> store.
> In the course of testing riak, I've set up a cluster in Amazons with 6 nodes
> across two EC2 instances (m2.xlarge). After seeing surprisingly a
> surprisingly bad write performance (which I'll write more on in a separate
> post once I've finished my tests), I wanted to migrate the cluster to
> instances with a better IO performance.
> Lets call the original EC2 instances A and B. The plan was to migrate the
> cluster to new EC2 instances called C and D. During the following actions no
> other processes were reading/writing from/to the cluster. All instances are
> in the same availability zone.
> What I did so far was to tell all riak nodes on B to leave the ring and let
> the ring re-stabilize. One surprising behaviour here was that the riak nodes
> on A suddenly all went into deep sleep mode (process state D) for about 30
> minutes, and all riak-admin status/transfer calls claimed all nodes were
> down when in fact they weren't and were quite busy. But left to themselves
> they sorted everything out in the end.
> Then I set up 3 new riak nodes on C and told them to join the cluster.
> So far everything went well. riak-admin transfers showed me that both the
> nodes on A and the nodes on C were waiting on/for handoffs. However, the
> handoffs didn't start. I gave the cluster an hour, but no data transfer got
> initiated to the new nodes.
> Since I didn't find any way to manually trigger the handoff, I told all the
> nodes on A (riak01, riak02 and riak03) to leave the cluster and after the
> last node on A left the ring, the handoffs started.
> After all the data in riak01 got moved to the nodes on C, the master process
> shut down and the handoff for the remaining data from riak02 and riak03
> stopped. I tried restarting riak01 manually, however riak-admin ringready
> claims that riak01 and riak04 (on C) disagree on the partition owners.
> riak-admin transfers still lists the same amount of partitions awaiting
> handoff as when the the handoff to the nodes on C started.
> My current data distribution is as follows (via du -c):
> On A:
> 1780 riak01/data
> 188948 riak02/data
> 3766736 riak03/data
> On B:
> 13215908 riak04/data
> 1855584 riak05/data
> 5745076 riak06/data
> riak04 and riak05 are awaiting the handoff of 341 partitions, riak06 of 342
> partitions.
> The ring_creation_size is 512, n_val for the bucket is 3, w is set to 1.
> My questions at this point are:
> 1. What would normally trigger a rebalancing of the nodes?
> 2. Is there a way to manually trigger a rebalancing?
> 3. Did I do anything wrong with the procedure described above to be left in
> the current odd state by riak?
> 4. How would I rectify this situation in a production environment?
> Regards,
> Sven
> ------------------------------------------
> Scoreloop AG, Brecherspitzstrasse 8, 81541 Munich, Germany,
> www.scoreloop.com
> sven.riedel at scoreloop.com
> Sitz der Gesellschaft: München, Registergericht: Amtsgericht München, HRB
> 174805
> Vorstand: Dr. Marc Gumpinger (Vorsitzender), Dominik Westner, Christian van
> der Leeden, Vorsitzender des Aufsichtsrates: Olaf Jacobi
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Sent from my mobile device

More information about the riak-users mailing list