TCP recv timeout and handoffs almost all the time

Simon Effenberg seffenberg at team.mobile.de
Thu Jul 18 14:31:01 EDT 2013


It's more than 30 handoffs sometimes:

Attempting to restart script through sudo -H -u riak
'riak at 10.47.109.209' waiting to handoff 6 partitions
'riak at 10.47.109.208' waiting to handoff 2 partitions
'riak at 10.47.109.207' waiting to handoff 1 partitions
'riak at 10.47.109.206' waiting to handoff 14 partitions
'riak at 10.47.109.205' waiting to handoff 12 partitions
'riak at 10.47.109.204' waiting to handoff 14 partitions
'riak at 10.47.109.203' waiting to handoff 16 partitions
'riak at 10.47.109.202' waiting to handoff 3 partitions
'riak at 10.47.109.201' waiting to handoff 3 partitions
'riak at 10.46.109.209' waiting to handoff 4 partitions
'riak at 10.46.109.208' waiting to handoff 1 partitions
'riak at 10.46.109.207' waiting to handoff 4 partitions
'riak at 10.46.109.206' waiting to handoff 12 partitions
'riak at 10.46.109.205' waiting to handoff 12 partitions
'riak at 10.46.109.204' waiting to handoff 13 partitions
'riak at 10.46.109.203' waiting to handoff 12 partitions
'riak at 10.46.109.202' waiting to handoff 17 partitions
'riak at 10.46.109.201' waiting to handoff 12 partitions


On Thu, 18 Jul 2013 20:21:57 +0200
Simon Effenberg <seffenberg at team.mobile.de> wrote:

> Hi @list,
> 
> I see sometimes logs talking about "hinted_handoff transfer of .. failed because of TCP recv timeout".
> Also riak-admin transfers shows me many handoffs (is it possible to give some insights about "how many" handoffs happened through "riak-admin status"?).
> 
> - Is it a normal behavior to have up to 30 handoffs from/to different nodes?
> - How can I get down to the problem with the TCP recv timeout? I'm not sure if this is a network problem or if the other node is too slow. The load is ok on the machines (some IOwait but not 100%). Maybe interfering with AAE?
> 
> Here the log information about the TCP recv timeout. But that is not that often but handoffs happens really often:
> 
> 2013-07-18 16:22:05.654 UTC [error] <0.28933.14>@riak_core_handoff_sender:start_fold:216 hinted_handoff transfer of riak_kv_vnode from 'riak at 10.46.109.207' 1118962191081472546749696200048404186924073353216 to 'riak at 10.46.109.205' 1118962191081472546749696200048404186924073353216 failed because of TCP recv timeout
> 2013-07-18 16:22:05.673 UTC [error] <0.202.0>@riak_core_handoff_manager:handle_info:282 An outbound handoff of partition riak_kv_vnode 1118962191081472546749696200048404186924073353216 was terminated for reason: {shutdown,timeout}
> 
> 
> Thanks in advance
> Simon
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon:     + 49-(0)30-8109 - 7173
Fax:     + 49-(0)30-8109 - 7131

Mail:     seffenberg at team.mobile.de
Web:    www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 




More information about the riak-users mailing list