Testing netsplit in Riak

Alexander Sicular siculars at gmail.com
Wed Feb 22 14:41:06 EST 2017

There's a reason the time is default higher. The larger the network the higher the probability nodes can't speak to each other momentarily. Too low too much gossip and too much flapping. Ymmv. 


Sent from my iRotaryPhone

> On Feb 22, 2017, at 13:17, Andrey Ershov <andrershov at gmail.com> wrote:
> Alexander, thanks for your reply!
> 1) I've set erlang.distribution.nettick_time to 1 second and writes after netsplit are very fast now. So this point is resolved. Do you know how this parameter affects false positive ratio? Riak docs stay that every nettick_time seconds netkernal will initiate remote processes life-checking. However, it does not say anything about the mechanism. Do you know how this failure detector works?
> 2) As for hinted handoff, I still can not find any solution. Variables that I've tried to change:
>    - vnode_management_timer from 10s to 1s
>    - transfer_limit from 2 to 100
> But still transfer take about a minute. Any other variables that I should take a look at?
> 2017-02-22 21:12 GMT+03:00 Alexander Sicular <siculars at basho.com>:
>> 1. Check the erlang vm variable "nettick", I believe. 
>> 2. Hinted handoff resource allocation are configurable via config file or at runtime. 
>>> On Wed, Feb 22, 2017 at 12:07 Andrey Ershov <andrershov at gmail.com> wrote:
>>> Hi, guys!
>>> I'm testing netsplit in Riak and can not achieve satisfiable behaviour.
>>> I've just two nodes cluster and bucket with the following settings n=3, w=2, r=2. And I have just a couple of entries.
>>> Basically I have two problems:
>>> 1) After the split, writes on one side of the partition start lagging hard. It takes more than 1 minute for the first write to be become successful. I understand that this is related to the process of setting up backup vnodes in Riak, but is any way to speed up the process?  Which configuration parameters influence that?
>>> 2) More weird problem is after netsplit. "riak-admin transfers" command immediately reports that there should 5 partition transfers from one node to another and 5 partition transfers in the opposite direction. But active transfers output is empty!
>>> I've put a watch on this command and active transfers are always empty. 
>>> Finally, it takes several minutes for Riak to finish hinted handoff. Several minutes just for several keys!
>>> What Riak is doing all this time? Anyway to speed up the process?
>>> 3) The reason why I'm concerned about hinted-handoff speed is because, I noticed that until this process finishes, I read stale data on both sides of ex-netsplit.
>>> -- 
>>> Thanks,
>>> Andrey
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> -- 
>> Alexander Sicular
>> Solutions Architect
>> Basho Technologies
>> 9175130679
>> @siculars
> -- 
> С уважением,
> Ершов Андрей
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170222/81279197/attachment-0002.html>

More information about the riak-users mailing list