Very (very) slow handoff, how to investigate?

Gal Barnea gal at eyeviewdigital.com
Thu Jan 26 15:12:15 EST 2012


Hi all

I have a 6 server cluster running on ec2 (m1.large) - this is an evaluation
environment, so practically no load besides the existing data
(~200 million records, ~1k each)

after running "riak-admin leave" on one of the node, I noticed that for
more than 3 hours
1 - member_status showed that there is one "leaving" node and pending data
to handoff on the rest but the numbers never changed
2 - riak-admin transfers -  showed handoffs waiting, but nothing changed

at this point, I restarted the "leaving" node, so now the status is
1 - member_status - still stuck with the same numbers
2 - transfers - are slowly changing

The leaving server's logs are showing that a single handoff started after
the restart,but nothing since (roughly an hour ago)

Interestingly, the leaving server is pretty idle while the remaining
servers are working hard at 50%-60% cpu

so, the question now is where should I dig around to try and understand
what's going on. Any thoughts?

Thanks
Gal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120126/885bd913/attachment.html>


More information about the riak-users mailing list