riak handoffs stalled

Ciprian Manea ciprian at basho.com
Mon Jul 14 08:11:48 EDT 2014


Hi Leonid,

Which Riak version are you running?

Have you committed* the cluster plan after issuing the cluster force-remove
<node> commands?

What is the output of $ riak-admin transfer-limit, ran from one of your
riak nodes?


*Do not run this command yet if you have not done it already.
Please run a riak-admin cluster plan and attach its output here.


Thanks,
Ciprian


On Mon, Jul 14, 2014 at 2:41 PM, Леонид Рябоштан <
leonid.riaboshtan at twiket.com> wrote:

> Hello, guys,
>
> It seems like we ran into emergency. I wonder if there can be any help on
> that.
>
> Everything that happened below was because we were trying to rebalace
> space used by nodes that we running out of space.
>
> Cluster is 7 machines now, member_status looks like:
> Attempting to restart script through sudo -u riak
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
>
> -------------------------------------------------------------------------------
> valid      15.6%     20.3%    'riak at 192.168.135.180'
> valid       0.0%      0.0%    'riak at 192.168.152.90'
> valid       0.0%      0.0%    'riak at 192.168.153.182'
> valid      26.6%     23.4%    'riak at 192.168.164.133'
> valid      27.3%     21.1%    'riak at 192.168.177.36'
> valid       8.6%     15.6%    'riak at 192.168.194.138'
> valid      21.9%     19.5%    'riak at 192.168.194.149'
>
> -------------------------------------------------------------------------------
> Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
> 2 nodes with 0 Ring was made to force leave the cluster, they have plenty
> of data on them which is now seems to be not accessible. Handoffs are stuck
> it seems. Node 'riak at 192.168.152.90'(is in same situation as '
> riak at 192.168.153.182') tries to handoff partitions to '
> riak at 192.168.164.133' but fails for unknown reason after huge
> timeouts(from 5 to 40 minutes). Partition it's trying to move is about 10Gb
> in size. It grows slowly on target node, but probably it's just usual
> writes from normal operation. It doesn't get any smaller on source node.
>
> I wonder is there any way to let cluster know that we want those nodes to
> be actually members of source node and there's no actual need to transfer
> them? How to redo cluster ownership balance? Revert this force-leave stuff.
>
> Thank you,
> Leonid
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140714/e6d2725d/attachment.html>


More information about the riak-users mailing list