Dead node not leaving with force-remove

Paul Armstrong riak at otoh.org
Mon Dec 5 14:33:13 EST 2011


At 2011-11-28T17:19+0000, Paul Armstrong wrote:
> We have a 1.0.2 cluster with a node that's gone but still listed in
> member_status (as legacy):
> 
> riak-admin member_status
> Attempting to restart script through sudo -u riak
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
> -------------------------------------------------------------------------------
> (legacy)    7.8%      --      'riak at 10.115.13.51'
> valid       7.8%      --      'riak at 10.119.82.164'
> valid       7.8%      --      'riak at 10.13.22.183'
> valid       7.8%      --      'riak at 10.13.51.171'
> valid       7.8%      --      'riak at 10.37.46.7'
> valid       7.6%      --      'riak at 10.76.45.111'
> valid       7.6%      --      'riak at 10.76.62.122'
> valid       7.6%      --      'riak at 10.78.197.82'
> valid       7.6%      --      'riak at 10.79.69.234'
> valid       7.6%      --      'riak at 10.80.157.112'
> valid       7.6%      --      'riak at 10.82.155.202'
> valid       7.6%      --      'riak at 10.82.25.84'
> valid       7.6%      --      'riak at 10.84.5.68'
> -------------------------------------------------------------------------------
> Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Many thanks to the Basho team (Dan Reverri and Mark Phillips in
particular) for helping to solve this. The ring was corrupted and there
was an interesting bug around legacy gossip and forced removal (see
Bug 1298: Legacy gossip / force-remove troubles --->
https://issues.basho.com/show_bug.cgi?id=1298 )

After some erlang console work, our ring no longer had the ghost hosts
in it, was able to settle into the new gossip mode and we were able to
shrink it.

Paul



More information about the riak-users mailing list