Adding a new machine to a three node cluster cause partition handoff problems

Ivaylo Panitchkov ipanitchkov at hibernum.com
Tue Jan 10 11:13:12 EST 2012


Hello All,

We have a cluster of three machines (Debian 6.0, 4GB RAM, 
riak_1.0.2-1_amd64.deb, n_val: 3) that serves an application for a 
while. As we go to production soon added a fourth machine to the cluster 
(exactly the same as the first three) yesterday. The partition handoff 
began in the late afternoon and I had an impression that the transition 
will not take too long as there are only few hundred IMPORTANT records 
in the storage for the moment. Today in the morning checked the 
situation again and realized the partition handoff still runs (or get 
stuck). The Ownership Handoff is still the same since yesterday (at 
least 19 hours till now). Any suggestions to fix the problem are welcome :-)

REMARK: Replaced the IP addresses for security sake


# riak-admin ringready
Attempting to restart script through sudo -u riak
TRUE All nodes agree on the ring 
['riak at YYY.YYY.YYY.YYY','riak at XXX.XXX.XXX.XXX','riak at AAA.AAA.AAA.AAA','riak at BBB.BBB.BBB.BBB']


# riak-admin transfers
Attempting to restart script through sudo -u riak
'riak at BBB.BBB.BBB.BBB' waiting to handoff 2 partitions
'riak at AAA.AAA.AAA.AAA' waiting to handoff 2 partitions
'riak at YYY.YYY.YYY.YYY' waiting to handoff 2 partitions


# riak-admin ring_status
Attempting to restart script through sudo -u riak
================================== Claimant 
===================================
Claimant: 'riak at XXX.XXX.XXX.XXX'
Status: up
Ring Ready: true

============================== Ownership Handoff 
==============================
Owner: riak at XXX.XXX.XXX.XXX
Next Owner: riak at YYY.YYY.YYY.YYY

Index: 548063113999088594326381812268606132370974703616
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 1370157784997721485815954530671515330927436759040
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

-------------------------------------------------------------------------------

============================== Unreachable Nodes 
==============================
All nodes are up and reachable


# riak-admin member_status
Attempting to restart script through sudo -u riak
================================= Membership 
==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 21.9% 25.0% 'riak at YYY.YYY.YYY.YYY'
valid 28.1% 25.0% 'riak at XXX.XXX.XXX.XXX'
valid 25.0% 25.0% 'riak at AAA.AAA.AAA.AAA'
valid 25.0% 25.0% 'riak at BBB.BBB.BBB.BBB'
-------------------------------------------------------------------------------
Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0


-- 
Ivaylo Panitchkov
Software developer
Hibernum Creations Inc.

Ce courriel est confidentiel et peut aussi être protégé par la loi.Si vous avez reçu ce courriel par erreur, veuillez nous en aviser immédiatement en y répondant, puis supprimer ce message de votre système. Veuillez ne pas le copier, l’utiliser pour quelque raison que ce soit ni divulguer son contenu à quiconque.
This email is confidential and may also be legally privileged. If you have received this email in error, please notify us immediately by reply email and then delete this message from your system. Please do not copy it or use it for any purpose or disclose its content.




More information about the riak-users mailing list