anti_entropy_expire

Edgar Veiga edgarmveiga at gmail.com
Thu Jan 2 05:05:20 EST 2014


This is the only thing related to AAE that exists in my app.config. I
haven't changed any default values...

            %% Enable active anti-entropy subsystem + optional debug
messages:
            %%   {anti_entropy, {on|off, []}},
            %%   {anti_entropy, {on|off, [debug]}},
            {anti_entropy, {on, []}},

            %% Restrict how fast AAE can build hash trees. Building the tree
            %% for a given partition requires a full scan over that
partition's
            %% data. Once built, trees stay built until they are expired.
            %% Config is of the form:
            %%   {num-builds, per-timespan-in-milliseconds}
            %% Default is 1 build per hour.
            {anti_entropy_build_limit, {1, 3600000}},

            %% Determine how often hash trees are expired after being built.
            %% Periodically expiring a hash tree ensures the on-disk hash
tree
            %% data stays consistent with the actual k/v backend data. It
also
            %% helps Riak identify silent disk failures and bit rot.
However,
            %% expiration is not needed for normal AAE operation and should
be
            %% infrequent for performance reasons. The time is specified in
            %% milliseconds. The default is 1 week.
            {anti_entropy_expire, 604800000},

            %% Limit how many AAE exchanges/builds can happen concurrently.
            {anti_entropy_concurrency, 2},

            %% The tick determines how often the AAE manager looks for work
            %% to do (building/expiring trees, triggering exchanges, etc).
            %% The default is every 15 seconds. Lowering this value will
            %% speedup the rate that all replicas are synced across the
cluster.
            %% Increasing the value is not recommended.
            {anti_entropy_tick, 15000},

            %% The directory where AAE hash trees are stored.
            {anti_entropy_data_dir, "/var/lib/riak/anti_entropy"},

            %% The LevelDB options used by AAE to generate the
LevelDB-backed
            %% on-disk hashtrees.
            {anti_entropy_leveldb_opts, [{write_buffer_size, 4194304},
                                         {max_open_files, 20}]},

I'll update the bloom filters value and see what happens...

It's thursday again, and the regeneration process has started again. Since
I've updated to 1.4.6, I have another thing different. The get/put values
for each cluster node now have a "random" behaviour. Take a look at the
next screenshot

https://cloudup.com/cgbu9VNhSo1

Best regards


On 31 December 2013 21:16, Charlie Voiselle <cvoiselle at basho.com> wrote:

> Edgar:
>
> Could you attach the AAE section of your app.config?  I’d like to look
> into this issue further for you.  Something I think you might be running
> into is https://github.com/basho/riak_core/pull/483.
>
> The issue of concern is that the LevelDB bloom filter is not enabled
> properly for the instance into which the AAE data is stored.  You can
> mitigate this particular issue by adding *{use_bloomfilter, true}* as
> shown below:
>
>             %% The LevelDB options used by AAE to generate the LevelDB-backed
>             %% on-disk hashtrees.
>             {anti_entropy_leveldb_opts, [{write_buffer_size, 4194304},
>                                          {max_open_files, 20}]},
>
>
> Becomes:
>
>
>             %% The LevelDB options used by AAE to generate the LevelDB-backed
>             %% on-disk hashtrees.
>
>             {anti_entropy_leveldb_opts, [{write_buffer_size, 4194304},
> 					 {use_bloomfilter, true},
>                                          {max_open_files, 20}]},
>
>
> This might not solve your specific problem, but it will certainly improve
> your AAE performance.
>
> Thanks,
> Charlie Voiselle
>
> On Dec 31, 2013, at 12:04 PM, Edgar Veiga <edgarmveiga at gmail.com> wrote:
>
> Hey guys!
>
> Nothing on this one?
>
> Btw: Happy new year :)
>
>
> On 27 December 2013 22:35, Edgar Veiga <edgarmveiga at gmail.com> wrote:
>
>> This is a du -hs * of the riak folder:
>>
>> 44G anti_entropy
>> 1.1M kv_vnode
>> 252G leveldb
>> 124K ring
>>
>> It's a 6 machine cluster, so ~1512G of levelDB.
>>
>> Thanks for the tip, I'll upgrade in a near future!
>>
>> Best regards
>>
>>
>> On 27 December 2013 21:41, Matthew Von-Maszewski <matthewv at basho.com>wrote:
>>
>>> I have a query out to the developer that can better respond to your
>>> follow-up questions.  It might be Monday before we get a reply due to the
>>> holidays.
>>>
>>> Do you happen to know how much data is in the leveldb dataset and/or one
>>> vnode?  Not sure it will change the response, but might be nice to have
>>> that info available.
>>>
>>> Matthew
>>>
>>> P.S.  Unrelated to your question:  Riak 1.4.4 is available for download.
>>>  It has a couple of nice bug fixes for leveldb.
>>>
>>>
>>> On Dec 27, 2013, at 2:08 PM, Edgar Veiga <edgarmveiga at gmail.com> wrote:
>>>
>>> Ok, thanks for confirming!
>>>
>>> Is it normal, that this action affects the overall state of the cluster?
>>> On the 26th It started the regeneration and the the response times of the
>>> cluster raised to never seen values. It was a day of heavy traffic but
>>> everything was going quite ok until it started the regeneration process..
>>>
>>> Have you got any advices about changing those app.config values? My
>>> cluster is running smoothly for the past 6 months and I don't want to start
>>> all over again :)
>>>
>>> Best Regards
>>>
>>>
>>> On 27 December 2013 18:56, Matthew Von-Maszewski <matthewv at basho.com>wrote:
>>>
>>>> Yes.  Confirmed.
>>>>
>>>> There are options available in app.config to control how often this
>>>> occurs and how many vnodes rehash at once:  defaults are every 7 days and
>>>> two vnodes per server at a time.
>>>>
>>>> Matthew Von-Maszewski
>>>>
>>>>
>>>> On Dec 27, 2013, at 13:50, Edgar Veiga <edgarmveiga at gmail.com> wrote:
>>>>
>>>> Hi!
>>>>
>>>> I've been trying to find what may be the cause of this.
>>>>
>>>> Every once in a week, all the nodes in my riak cluster start to do some
>>>> kind of operation that lasts at least for two days.
>>>>
>>>> You can watch a sample of my munin logs regarding the last week in here:
>>>>
>>>> https://cloudup.com/imWiBwaC6fm
>>>> Take a look at the days 19 and 20, and now it has started again on the
>>>> 26...
>>>>
>>>> I'm suspecting that this may be caused by the aae hash trees being
>>>> regenerated, as you say in your documentation:
>>>> For added protection, Riak periodically (default: once a week) clears
>>>> and regenerates all hash trees from the on-disk K/V data.
>>>> Can you confirm me that this may be the root of the "problem" and if
>>>> it's normal for the action to last for two days?
>>>>
>>>> I'm using riak 1.4.2 on 6 machines, with centOS. The backend is levelDB.
>>>>
>>>> Best Regards,
>>>> Edgar Veiga
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>>
>>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140102/92baef95/attachment.html>


More information about the riak-users mailing list