riak handoffs stalled

Леонид Рябоштан leonid.riaboshtan at twiket.com
Mon Jul 14 07:41:59 EDT 2014

Hello, guys,

It seems like we ran into emergency. I wonder if there can be any help on

Everything that happened below was because we were trying to rebalace space
used by nodes that we running out of space.

Cluster is 7 machines now, member_status looks like:
Attempting to restart script through sudo -u riak
================================= Membership
Status     Ring    Pending    Node
valid      15.6%     20.3%    'riak at'
valid       0.0%      0.0%    'riak at'
valid       0.0%      0.0%    'riak at'
valid      26.6%     23.4%    'riak at'
valid      27.3%     21.1%    'riak at'
valid       8.6%     15.6%    'riak at'
valid      21.9%     19.5%    'riak at'
Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

2 nodes with 0 Ring was made to force leave the cluster, they have plenty
of data on them which is now seems to be not accessible. Handoffs are stuck
it seems. Node 'riak at'(is in same situation as '
riak at') tries to handoff partitions to 'riak at'
but fails for unknown reason after huge timeouts(from 5 to 40 minutes).
Partition it's trying to move is about 10Gb in size. It grows slowly on
target node, but probably it's just usual writes from normal operation. It
doesn't get any smaller on source node.

I wonder is there any way to let cluster know that we want those nodes to
be actually members of source node and there's no actual need to transfer
them? How to redo cluster ownership balance? Revert this force-leave stuff.

Thank you,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140714/c11f3827/attachment.html>

More information about the riak-users mailing list