Pending transfers when joining 1.0.3 node to 1.0.0 cluster

Fredrik Lindström Fredrik.Lindstrom at qbranch.se
Wed Jan 18 17:40:58 EST 2012


I just ran the two commands on all 4 nodes.

When run on one of the original nodes the first command (riak_core_ring_manager:force_update()) results in output like the following in the console of the new node
<snip>
23:20:06.928 [info] loading merge_index './data/merge_index/331121464707782692405522344912282871640797216768'
23:20:06.929 [info] opened buffer './data/merge_index/331121464707782692405522344912282871640797216768/buffer.1'
23:20:06.929 [info] finished loading merge_index './data/merge_index/331121464707782692405522344912282871640797216768' with rollover size 912261.12
23:20:07.006 [info] loading merge_index './data/merge_index/730750818665451459101842416358141509827966271488'
23:20:07.036 [info] opened buffer './data/merge_index/730750818665451459101842416358141509827966271488/buffer.1'
23:20:07.036 [info] finished loading merge_index './data/merge_index/730750818665451459101842416358141509827966271488' with rollover size 1132462.08
23:20:47.050 [info] loading merge_index './data/merge_index/513809169374145557180982949001818249097788784640'
23:20:47.054 [info] opened buffer './data/merge_index/513809169374145557180982949001818249097788784640/buffer.1'
23:20:47.055 [info] finished loading merge_index './data/merge_index/513809169374145557180982949001818249097788784640' with rollover size 975175.6799999999
</snip>

riak_core_vnode_manager:force_handoffs() does not produce any output on any console on any node besides "OK". No tasty handover log messages to be found.

Furthermore I'm not sure what to make of the output from riak-admin transfers:
'test at qbkpxadmin01.ad.qnet.local' waiting to handoff 62 partitions
'qbkpx03 at qbkpx03.ad.qnet.local' waiting to handoff 42 partitions
'qbkpx01 at qbkpx01.ad.qnet.local' waiting to handoff 42 partitions

Our second node (qbkpx02) is missing from that list. The output also states that the new node (test) wants to handoff 62 partitions although it is the owner of 0 partitions.

riak-admin ring_status lists various pending ownership handoffs, all of them are between our 3 original nodes. The new node is not mentioned anywhere.

I'm really curious about the current state of our cluster. It does look rather exciting :)

/F
________________________________
From: Aphyr [aphyr at aphyr.com]
Sent: Wednesday, January 18, 2012 11:15 PM
To: Fredrik Lindström
Cc: riak-users at lists.basho.com
Subject: Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

Did you try riak_core_ring_manager:force_update() and force_handoffs() on the old partition owner as well as the new one? Can't recall off the top of my head which one needs to execute that handoff.

--Kyle

On Jan 18, 2012, at 2:08 PM, Fredrik Lindström wrote:

Thanks for the response Aphyr.

I'm seeing Waiting on: [riak_search_vnode,riak_kv_vnode,riak_pipe_vnode] instead of [] so I'm thinking it's a different scenario.
It might be worth mentioning that the data directory on the new node does contain relevant subdirectories but the disk footprint is so small I doubt any data has been transferred.

/F
________________________________
From: Aphyr [aphyr at aphyr.com]
Sent: Wednesday, January 18, 2012 10:46 PM
To: Fredrik Lindström
Cc: riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
Subject: Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org<https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTESorg>

If partition transfer is blocked awaiting [] (as opposed to [kv_vnode] or whatever), There's a snippet in there that might be helpful.

--Kyle

On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote:

After some digging I found a suggestion from Joseph Blomstedt in an earlier mail thread
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-January/007116.html

in the riak console:
riak_core_ring_manager:force_update().
riak_core_vnode_manager:force_handoffs().

The symptoms would appear to be the same although the cluster referenced in the mail thread does not appear to have search enabled,
as far as I can tell from the log snippets. The mail thread doesn't really specify which node to run the commands on so I tried both the new node and the current claimant of the cluster.

Sadly the suggested steps did not produce any kind of ownership handoff.

Any helpful ideas would be much appreciated :)

/F


________________________________
From: riak-users-bounces at lists.basho.com<mailto:riak-users-bounces at lists.basho.com> [riak-users-bounces at lists.basho.com] on behalf of Fredrik Lindström [Fredrik.Lindstrom at qbranch.se]
Sent: Wednesday, January 18, 2012 4:00 PM
To: riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
Subject: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

Hi everyone,
when we try to join a 1.0.3 node to an existing 1.0.0 (3 node) cluster the ownership transfer doesn't appear to take place. I'm guessing that we're making some stupid little mistake but we can't figure it out at the moment. Anyone run into something similar?

Riak Search is enabled on the original nodes in the cluster as well as the new node.
Ring size is set to 128

The various logfiles do not appear to contain any errors or warnings

Output from riak-admin member_status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      33.6%     25.0%    'qbkpx01 at qbkpx01.ad.qnet.local<mailto:'qbkpx01 at qbkpx01.ad.qnet.local>'
valid      33.6%     25.0%    'qbkpx02 at qbkpx02.ad.qnet.local<mailto:'qbkpx02 at qbkpx02.ad.qnet.local>'
valid      32.8%     25.0%    'qbkpx03 at qbkpx03.ad.qnet.local<mailto:'qbkpx03 at qbkpx03.ad.qnet.local>'
valid       0.0%     25.0%    'test at qbkpxadmin01.ad.qnet.local<mailto:'test at qbkpxadmin01.ad.qnet.local>'
-------------------------------------------------------------------------------

Output from riak-admin ring_status
See attached file

Output from riak-admin transfers
'test at qbkpxadmin01.ad.qnet.local<mailto:'test at qbkpxadmin01.ad.qnet.local>' waiting to handoff 10 partitions
'qbkpx03 at qbkpx03.ad.qnet.local<mailto:'qbkpx03 at qbkpx03.ad.qnet.local>' waiting to handoff 62 partitions
'qbkpx01 at qbkpx01.ad.qnet.local<mailto:'qbkpx01 at qbkpx01.ad.qnet.local>' waiting to handoff 63 partitions


/F


_______________________________________________
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120118/8fb00d46/attachment-0001.html>


More information about the riak-users mailing list