Hinted handoff failed because of tcp errors

Alexander Sicular siculars at gmail.com
Tue Nov 1 05:06:50 EDT 2016


Hi Ryan, yes, you can change a number of settings. Have you had a look
at http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
and http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-July/015529.html
?

-Alexander

On Tue, Nov 1, 2016 at 2:43 AM, Ryan Maclear <ryanm at miranetworks.net> wrote:
> Good Day,
>
> We have a 4 node riak cluster running inside AWS. The riak is riak-kv 2.1.2
> with AAE enabled on Ubuntu 14.04.4 LTS
>
> We are in the process of replacing one node with another using the process
> described here:
>
> http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/replacing-node/
>
> We have successfully replaced two of the nodes so far but we are having a
> problem with the third. If we look at /var/log/riak/console.log we see the
> start of the hinted handoff, and some time later (sometimes minutes and
> sometimes hours) we see:
>
> 2016-10-31 06:30:40.090 [error]
> <0.19834.2101>@riak_core_handoff_sender:start_fold:272 hinted transfer of
> riak_kv_vnode from 'riak at aew54.miranetworks.net'
> 274031556999544297163190906134303066185487351808 to
> 'riak at aew75.miranetworks.net'
> 274031556999544297163190906134303066185487351808 failed because of TCP recv
> timeout
> 2016-10-31 06:30:40.090 [error]
> <0.187.0>@riak_core_handoff_manager:handle_info:303 An outbound handoff of
> partition riak_kv_vnode 274031556999544297163190906134303066185487351808 was
> terminated for reason: {shutdown,timeout}
>
> So the handoff was terminated due to a tcp timeout. The handoff then starts
> again.
>
> This has been going on for some times (about two weeks now).
>
> The current member status is as follows:
>
> riak-admin member-status
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
> -------------------------------------------------------------------------------
> leaving     0.0%      --      'riak at aew54.miranetworks.net'
> valid      25.0%      --      'riak at aew59.miranetworks.net'
> valid      25.0%      --      'riak at aew73.miranetworks.net'
> valid      25.0%      --      'riak at aew74.miranetworks.net'
> valid      25.0%      --      'riak at aew75.miranetworks.net'
> -------------------------------------------------------------------------------
> Valid:4 / Leaving:1 / Exiting:0 / Joining:0 / Down:0
>
>
> Here are some questions:
>
> 1. What is the default tcp timeout?
> 2. Is there any way to increase this timeout?
> 3. Is there any way to increase the rate of handoff?
> 4. Are there any other parameters we can tune to try and avoid this?
>
> The output from riak-admin transfers is as follows:
>
> 'riak at aew54.miranetworks.net' waiting to handoff 1 partitions
>
> Active Transfers:
>
> transfer type: hinted
> vnode type: riak_kv_vnode
> partition: 274031556999544297163190906134303066185487351808
> started: 2016-11-01 05:30:47 [2.10 hr ago]
> last update: 2016-11-01 07:36:51 [3.03 s ago]
> total size: 78393086512 bytes
> objects transferred: 11440967
>
>                          1513 Objs/s
> riak at aew54.miranetworks.n  =======>  riak at aew75.miranetworks.n
> et                                   et
>         |======                                     |  15%
>                           1.53 MB/s
>
>
> Thanks,
> Ryan Maclear
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list