Simultaneous handoff and merge

Yuri Lukyanov snaky at
Thu Apr 18 05:07:52 EDT 2013


I have a cluster of 17 riak (1.2.1) nodes with bitcask as a backend.

Recetly one of the node was down for a while. After the node had been
started the cluster started doing handoffs as expected. But then a merge
process also began on the same node. I know this from the log messages like

2013-04-18 08:14:09.061 [info] <0.22952.79> Merged

And then something went wrong (the logs on the same node):

2013-04-18 08:39:22.217 [error] <0.31842.70> Supervisor riak_core_vnode_sup
had child undefined started with {riak_core_vnode,start_link,undefined} at
<0.4000.80> exit with reason
{timeout,{gen_server,call,[riak_core_handoff_manager,{add_outbound,riak_kv_vnode,208378163135070142634509751539626289911881007104,riak at nsto2r5,<0.4000.80>}]}}
in context child_terminated

2013-04-18 08:42:46.067 [error] <0.5154.80> gen_server <0.5154.80>
terminated with reason:
2013-04-18 08:42:52.790 [error] <0.5154.80> CRASH REPORT Process
riak_core_handoff_listener with 1 neighbours exited with reason:
{timeout,{gen_server,call,[riak_core_handoff_manager,{add_inbound,[]}]}} in
gen_server:terminate/6 line 747
2013-04-18 08:42:53.450 [error] <0.31847.70> Supervisor
riak_core_handoff_listener_sup had child riak_core_handoff_listener started
with riak_core_handoff_listener:start_link() at <0.5154.80> exit with
{timeout,{gen_server,call,[riak_core_handoff_manager,{add_inbound,[]}]}} in
context child_terminated

The node itself was disappearing from time to time:

# riak-admin ring-status
Node is not running!

The beam process was still running though.

Maybe it's not releated to handoffs & merge. It was just a guess.

Any information and advice on this would be greatly appriciated. It's still
happening right now and I could gather more details if someone wanted me to.

Thanks in advance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list