Very (very) slow handoff, how to investigate?
gal at eyeviewdigital.com
Thu Jan 26 15:12:15 EST 2012
I have a 6 server cluster running on ec2 (m1.large) - this is an evaluation
environment, so practically no load besides the existing data
(~200 million records, ~1k each)
after running "riak-admin leave" on one of the node, I noticed that for
more than 3 hours
1 - member_status showed that there is one "leaving" node and pending data
to handoff on the rest but the numbers never changed
2 - riak-admin transfers - showed handoffs waiting, but nothing changed
at this point, I restarted the "leaving" node, so now the status is
1 - member_status - still stuck with the same numbers
2 - transfers - are slowly changing
The leaving server's logs are showing that a single handoff started after
the restart,but nothing since (roughly an hour ago)
Interestingly, the leaving server is pretty idle while the remaining
servers are working hard at 50%-60% cpu
so, the question now is where should I dig around to try and understand
what's going on. Any thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users