Understanding Riaks rebalancing and handoff behaviour

Sven Riedel sven.riedel at scoreloop.com
Tue Nov 9 10:08:53 EST 2010

I'm currently assessing how well riak fits our needs as a large scale data store. 

In the course of testing riak, I've set up a cluster in Amazons with 6 nodes across two EC2 instances (m2.xlarge). After seeing surprisingly a surprisingly bad write performance (which I'll write more on in a separate post once I've finished my tests), I wanted to migrate the cluster to instances with a better IO performance.

Lets call the original EC2 instances A and B. The plan was to migrate the cluster to new EC2 instances called C and D. During the following actions no other processes were reading/writing from/to the cluster. All instances are in the same availability zone.

What I did so far was to tell all riak nodes on B to leave the ring and let the ring re-stabilize. One surprising behaviour here was that the riak nodes on A suddenly all went into deep sleep mode (process state D) for about 30 minutes, and all riak-admin status/transfer calls claimed all nodes were down when in fact they weren't and were quite busy. But left to themselves they sorted everything out in the end.

Then I set up 3 new riak nodes on C and told them to join the cluster.

So far everything went well. riak-admin transfers showed me that both the nodes on A and the nodes on C were waiting on/for handoffs. However, the handoffs didn't start. I gave the cluster an hour, but no data transfer got initiated to the new nodes. 

Since I didn't find any way to manually trigger the handoff, I told all the nodes on A (riak01, riak02 and riak03) to leave the cluster and after the last node on A left the ring, the handoffs started.
After all the data in riak01 got moved to the nodes on C, the master process shut down and the handoff for the remaining data from riak02 and riak03 stopped. I tried restarting riak01 manually, however riak-admin ringready claims that riak01 and riak04 (on C) disagree on the partition owners. riak-admin transfers still lists the same amount of partitions awaiting handoff as when the the handoff to the nodes on C started.

My current data distribution is as follows (via du -c):
On A:
1780 riak01/data
188948 riak02/data
3766736 riak03/data

On B:
13215908 riak04/data
1855584 riak05/data
5745076 riak06/data

riak04 and riak05 are awaiting the handoff of 341 partitions, riak06 of 342 partitions.

The ring_creation_size is 512, n_val for the bucket is 3, w is set to 1.

My questions at this point are:
1. What would normally trigger a rebalancing of the nodes? 
2. Is there a way to manually trigger a rebalancing?
3. Did I do anything wrong with the procedure described above to be left in the current odd state by riak?
4. How would I rectify this situation in a production environment?


Scoreloop AG, Brecherspitzstrasse 8, 81541 Munich, Germany, www.scoreloop.com
sven.riedel at scoreloop.com

Sitz der Gesellschaft: München, Registergericht: Amtsgericht München, HRB 174805 
Vorstand: Dr. Marc Gumpinger (Vorsitzender), Dominik Westner, Christian van der Leeden, Vorsitzender des Aufsichtsrates: Olaf Jacobi 

More information about the riak-users mailing list