Dead node not leaving with force-remove

Paul Armstrong riak at otoh.org
Mon Nov 28 12:19:34 EST 2011


We have a 1.0.2 cluster with a node that's gone but still listed in
member_status (as legacy):

riak-admin member_status
Attempting to restart script through sudo -u riak
================================= Membership
==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
(legacy)    7.8%      --      'riak at 10.115.13.51'
valid       7.8%      --      'riak at 10.119.82.164'
valid       7.8%      --      'riak at 10.13.22.183'
valid       7.8%      --      'riak at 10.13.51.171'
valid       7.8%      --      'riak at 10.37.46.7'
valid       7.6%      --      'riak at 10.76.45.111'
valid       7.6%      --      'riak at 10.76.62.122'
valid       7.6%      --      'riak at 10.78.197.82'
valid       7.6%      --      'riak at 10.79.69.234'
valid       7.6%      --      'riak at 10.80.157.112'
valid       7.6%      --      'riak at 10.82.155.202'
valid       7.6%      --      'riak at 10.82.25.84'
valid       7.6%      --      'riak at 10.84.5.68'
-------------------------------------------------------------------------------
Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0


riak-admin force-remove does not remove the node (we've tried a few
times over the last 4 days).

This node was down before we did the upgrade, but wasn't removing so we
upgraded anyway. As you can see here, doing a force-remove reports
successful, but the node is still listed:

riak-admin force-remove 'riak at 10.115.13.51'
Attempting to restart script through sudo -u riak
Success: "riak at 10.115.13.51" removed from the cluster

riak-admin member_status
Attempting to restart script through sudo -u riak
================================= Membership
==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
(legacy)    7.8%      --      'riak at 10.115.13.51'
valid       7.8%      --      'riak at 10.119.82.164'
valid       7.8%      --      'riak at 10.13.22.183'
valid       7.8%      --      'riak at 10.13.51.171'
valid       7.8%      --      'riak at 10.37.46.7'
valid       7.6%      --      'riak at 10.76.45.111'
valid       7.6%      --      'riak at 10.76.62.122'
valid       7.6%      --      'riak at 10.78.197.82'
valid       7.6%      --      'riak at 10.79.69.234'
valid       7.6%      --      'riak at 10.80.157.112'
valid       7.6%      --      'riak at 10.82.155.202'
valid       7.6%      --      'riak at 10.82.25.84'
valid       7.6%      --      'riak at 10.84.5.68'
-------------------------------------------------------------------------------
Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

After this, a small number of handoffs are seen in the logs:

17:13:29.514 [info] Handoff of partition riak_kv_vnode
1033327329519114953886199041881434478741108555776 from
'riak at 10.13.51.171' to 'riak at 10.82.25.84' completed: sent 2 objects in
0.04 seconds
17:13:39.470 [info] Starting handoff of partition riak_kv_vnode
176978713895539025251227460211737396911460581376 from
'riak at 10.13.51.171' to 'riak at 10.82.155.202'
17:13:39.480 [info] Starting handoff of partition riak_kv_vnode
1238850997268773176758592221482161778380224069632 from
'riak at 10.13.51.171' to 'riak at 10.76.45.111'
17:13:39.512 [info] Handoff of partition riak_kv_vnode
176978713895539025251227460211737396911460581376 from
'riak at 10.13.51.171' to 'riak at 10.82.155.202' completed: sent 1 objects in
0.04 seconds
17:13:39.525 [info] Handoff of partition riak_kv_vnode
1238850997268773176758592221482161778380224069632 from
'riak at 10.13.51.171' to 'riak at 10.76.45.111' completed: sent 3 objects in
0.04 seconds

Here's the pending transfer list:

riak-admin transfers
Nodes ['riak at 10.115.13.51'] are currently down.
'riak at 10.84.5.68' waiting to handoff 28 partitions
'riak at 10.82.25.84' waiting to handoff 40 partitions
'riak at 10.82.155.202' waiting to handoff 40 partitions
'riak at 10.80.157.112' waiting to handoff 40 partitions
'riak at 10.79.69.234' waiting to handoff 40 partitions
'riak at 10.78.197.82' waiting to handoff 39 partitions
'riak at 10.76.62.122' waiting to handoff 40 partitions
'riak at 10.76.45.111' waiting to handoff 40 partitions
'riak at 10.37.46.7' waiting to handoff 40 partitions
'riak at 10.13.51.171' waiting to handoff 40 partitions
'riak at 10.13.22.183' waiting to handoff 40 partitions
'riak at 10.119.82.164' waiting to handoff 40 partitions

Any ideas on how to get the cluster to remove this node?

Thanks,
Paul




More information about the riak-users mailing list