Handoffs are too slow after netsplit

Andrey Ershov andrershov at gmail.com
Fri Feb 24 12:27:25 EST 2017


Thanks guys for your replies.

Charlie, I've seen a ticket regarding this issue
https://github.com/basho/riak/issues/754 - I'm not calling vnode-status
command.
Douglas, I suppose vnode_inactivity_timeout
<https://github.com/basho/riak_core/search?utf8=%E2%9C%93&q=vnode_inactivity_timeout>
should
not influence riak_core_vnode_manager:force_handoff() execution time. As
far as I understand, when vnode_inactivity_timeout has triggered, vnode
initiates takeoff, but here I'm forcing take off and it does not start
right away.
Anyway, I've set vnode_inactivity_timeout to 2 seconds and no longer call
foce_handoff. Seems that handoffs are now faster, but still might take up
to 30 seconds.
Guys, probably there is some other timer should trigger before takeoff can
proceed?


2017-02-24 20:20 GMT+03:00 Charlie Voiselle <cvoiselle at basho.com>:

> Andrey:
>
> Another thing that can stall handoff is running commands that reset the
> vnode activity timer.  The `riak-admin vnode-status` command will reset the
> activity timer and should never be run more frequently that then vnode
> inactivity timeout; if you do, that can permanently stall handoff.  We have
> seen this before at a customer site where they were collecting the statics
> from the vnode-status command into their metrics system.
>
> Regards,
> Charlie Voiselle
>
>
> On Feb 23, 2017, at 7:14 AM, Douglas Rohrer <drohrer at basho.com> wrote:
>
> Andrey:
>
> It's waiting for 60 seconds, literally...
>
> See https://github.com/basho/riak_core/search?utf8=%E2%9C%
> 93&q=vnode_inactivity_timeout - handoff is not initiated until a vnode
> has been inactive for the specified inactivity period.
>
> For demonstration purposes, if you want to reduce this time, you could set
> the riak_core.vnode_inactivity_timeout period lower ,which can be set in
> advanced.config. Also note that, depending on the backend you use, it's
> possible to have other settings set lower than the vnode inactivity
> timeout, you can actually prevent handoff completely - see
> http://docs.basho.com/riak/kv/2.2.0/setup/planning/
> backend/bitcask/#sync-strategy, for examnple.
>
> Hope this helps.
>
> Doug
>
> On Thu, Feb 23, 2017 at 6:40 AM Andrey Ershov <andrershov at gmail.com>
> wrote:
>
> Hi, guys!
>
> I'd like to follow up on handoffs behaviour after netsplit. The problem is
> that right after network partition is healed, "riak-admin transfers"
> command says that there are X partitions waiting transfer from one node to
> another, and Y partitions waiting transfer in the opposite direction. What
> are they waiting for? Active transfers section is always empty. It takes
> about 1 minute for transfer to occur. I've increased transfer_limit to 100
> and it does not help.
> Also I've tried to attach to Erlang VM and execute
> riak_core_vnode_manager:force_handoff() on each node. This command
> returns 'ok'. But seems that it does not work right after network is
> healed. After some time 30-60 s, force_handoff() works as expected, but
> actually it's the same latency as in auto handoff case.
>
> So what is it waiting for? Any ideas?
>
> I'm preparing real-time coding demo to be shown on the conference. So it's
> too much time to wait for 1 minute for handoff to occur just for a couple
> of keys...
> --
> Thanks,
> Andrey
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>


-- 
С уважением,
Ершов Андрей
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170224/d57aeba5/attachment-0002.html>


More information about the riak-users mailing list