Handoff stalled on 1.0.2 riak cluster

John Axel Eriksson john at insane.se
Sun Jun 3 05:06:43 EDT 2012


Hi.

We had an issue where one of the riak servers died (had to be force removed
from cluster). After we did that things got really bad and most data was
unreachable for hours. I added a new node to replace the old one at one
point as well - that never got any data and even now about a day later it
hasn't gotten any data.
What seems to be the issue now is that there are a few nodes are waiting on
handoff of 1 partition. When I look at ring_status I see this:

Attempting to restart script through sudo -u riak
================================== Claimant
===================================
Claimant:  'riak at r-001.x.x.x
Status:     up
Ring Ready: true

============================== Ownership Handoff
==============================
Owner:      riak at r-004.x.x.x
Next Owner: riak at r-003.x.x.x

Index: 930565495644285842450002452081070828921550798848
  Waiting on: []
  Complete:   [riak_kv_vnode,riak_pipe_vnode,riak_search_vnode]

-------------------------------------------------------------------------------

============================== Unreachable Nodes
==============================
All nodes are up and reachable


Ok, so it looks like the problem described in the Release Notes for 1.0.2
here https://github.com/basho/riak/blob/1.0.2-release/RELEASE-NOTES.org.
Unfortunately I've run that code (through riak attach) with no result.

It's been in this state for 12 hours now I think. What can we do to fix our
cluster?

I upgraded to 1.0.3 hoping it would fix our problems but that didn't help.
I cannot upgrade to 1.1.x because we mainly use Luwak for large object
support
and that's discontinued in 1.1.x as far as I know.

Thanks for your help,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120603/0c6ad33a/attachment.html>


More information about the riak-users mailing list