Handoff stalled on 1.0.2 riak cluster

John Axel Eriksson john at insane.se
Sun Jun 3 05:06:43 EDT 2012


We had an issue where one of the riak servers died (had to be force removed
from cluster). After we did that things got really bad and most data was
unreachable for hours. I added a new node to replace the old one at one
point as well - that never got any data and even now about a day later it
hasn't gotten any data.
What seems to be the issue now is that there are a few nodes are waiting on
handoff of 1 partition. When I look at ring_status I see this:

Attempting to restart script through sudo -u riak
================================== Claimant
Claimant:  'riak at r-001.x.x.x
Status:     up
Ring Ready: true

============================== Ownership Handoff
Owner:      riak at r-004.x.x.x
Next Owner: riak at r-003.x.x.x

Index: 930565495644285842450002452081070828921550798848
  Waiting on: []
  Complete:   [riak_kv_vnode,riak_pipe_vnode,riak_search_vnode]


============================== Unreachable Nodes
All nodes are up and reachable

Ok, so it looks like the problem described in the Release Notes for 1.0.2
here https://github.com/basho/riak/blob/1.0.2-release/RELEASE-NOTES.org.
Unfortunately I've run that code (through riak attach) with no result.

It's been in this state for 12 hours now I think. What can we do to fix our

I upgraded to 1.0.3 hoping it would fix our problems but that didn't help.
I cannot upgrade to 1.1.x because we mainly use Luwak for large object
and that's discontinued in 1.1.x as far as I know.

Thanks for your help,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120603/0c6ad33a/attachment.html>

More information about the riak-users mailing list