two nodes stuck leaving / transferring data to each other, jamming up cluster

Swinney, Austin Austin at vimeo.com
Mon Jun 4 14:16:19 EDT 2012


Hi All,

The following is about leveldb, riak (1.1.1 2012-03-07) RedHat x86_64, and one riak newbie known as me!

I had this problem over the weekend whereby two nodes are leaving and they are both stuck trying to send transfers from one to the other.

I had backed them up with tar, and after they became stuck, I tried launching new instances with those levedb tar file backups.  But those new hosts, although listed in connected_nodes,  are not in the ring_members.

Are there any work arounds to resolving the stuck ownership handoff between the two leaving nodes?   I tried different scenarios of marking them as down.  That didn't seem to help.  ring_status indicated it wanted them down, then it wanted them back online.  etc.

I don't really need either one.  I'd like to eject them both from the cluster and have it rebalance onto the new nodes.


Both these were asked to leave:
Owner:      riak at 10.0.0.235<mailto:riak at 10.0.0.235>
Next Owner: riak at 10.0.0.234<mailto:riak at 10.0.0.234>

ring_status output:

[root at ip-10-0-0-171 riak]# riak-admin ring_status
Attempting to restart script through sudo -u riak
================================== Claimant ===================================
Claimant:  'riak at 10.0.0.232<mailto:riak at 10.0.0.232>'
Status:     up
Ring Ready: false

============================== Ownership Handoff ==============================
Owner:      riak at 10.0.0.235<mailto:riak at 10.0.0.235>
Next Owner: riak at 10.0.0.234<mailto:riak at 10.0.0.234>

Index: 727896323280039539339725844419242519555200778240
  Waiting on: [riak_kv_vnode]
  Complete:   [riak_pipe_vnode,riak_search_vnode]

Index: 876330083321459366969787585241990013739006427136
  Waiting on: [riak_kv_vnode]
  Complete:   [riak_pipe_vnode,riak_search_vnode]

-------------------------------------------------------------------------------

============================== Unreachable Nodes ==============================
All nodes are up and reachable


And member_status output:

[root at ip-10-0-0-168 ~]# riak-admin member_status
Attempting to restart script through sudo -u riak
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
down       16.6%     16.6%    'riak at 10.0.0.83<mailto:riak at 10.0.0.83>'
leaving     0.2%      0.0%    'riak at 10.0.0.235<mailto:riak at 10.0.0.235>'
leaving    15.6%     15.8%    'riak at 10.0.0.234<mailto:riak at 10.0.0.234>'
valid       0.0%      0.0%    'riak at 10.0.0.168<mailto:riak at 10.0.0.168>'
valid       0.0%      0.0%    'riak at 10.0.0.169<mailto:riak at 10.0.0.169>'
valid      18.0%     18.0%    'riak at 10.0.0.231<mailto:riak at 10.0.0.231>'
valid      17.4%     17.4%    'riak at 10.0.0.232<mailto:riak at 10.0.0.232>'
valid      15.4%     15.4%    'riak at 10.0.0.233<mailto:riak at 10.0.0.233>'
valid      16.8%     16.8%    'riak at 10.0.0.84<mailto:riak at 10.0.0.84>'
-------------------------------------------------------------------------------
Valid:6 / Leaving:1 / Exiting:0 / Joining:1 / Down:1


Thanks for your input!

Austin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120604/d2dbde26/attachment.html>


More information about the riak-users mailing list