Upgraded riak 1.4.9 is pegging the CPU

Engel Sanchez engel at basho.com
Thu Jun 5 13:05:41 EDT 2014


Hi Alain. I don't think you are seeing the AAE issue. The problem with
upgrading from 1.4.4-1.4.7 to 1.4.8 was a broken hash function in those,
which made the AAE trees incompatible. You should not have the same problem
in 1.4.0.  It seems that Erlang processes are repeatedly crashing and
restarting. It would be good to grab all your logs before they rotate so we
can take a look at exactly what is the first thing crashing and causing
this snowball effect.


On Thu, Jun 5, 2014 at 11:58 AM, Alain Rodriguez <alain at uber.com> wrote:

> Actually I just noticed it is likely the AAE issue:
>
> 2014-06-05 14:53:47.587 [error] <0.16054.31> CRASH REPORT Process
> <0.16054.31> with 0 neighbours exited with reason: no match of right hand
> value {error,{db_open,"IO error: lock
> /var/lib/riak/anti_entropy/1061872283373234151507364761270424381468763488256/LOCK:
> already held by process"}} in hashtree:new_segment_store/2 line 505 in
> gen_server:init_it/6 line 328
> 2014-06-05 14:53:47.588 [error] <0.16056.31> CRASH REPORT Process
> <0.16056.31> with 0 neighbours exited with reason: no match of right hand
> value {error,{db_open,"IO error: lock
> /var/lib/riak/anti_entropy/1335903840372778448670555667404727447654250840064/LOCK:
> already held by process"}} in hashtree:new_segment_store/2 line 505 in
> gen_server:init_it/6 line 328
> 2014-06-05 14:53:47.588 [error] <0.16055.31> CRASH REPORT Process
> <0.16055.31> with 0 neighbours exited with reason: no match of right hand
> value {error,{db_open,"IO error: lock
> /var/lib/riak/anti_entropy/1267395951122892374379757940871151681107879002112/LOCK:
> already held by process"}} in hashtree:new_segment_store/2 line 505 in
> gen_server:init_it/6 line 328
>
> Bollocks!
>
>
> On Thu, Jun 5, 2014 at 8:49 AM, Alain Rodriguez <alain at uber.com> wrote:
>
>> Thanks for the quick reply and no I did not. Is this something I should
>> be able to do now (stop, remove files, start again) or is it too late? How
>> could I verify this is the issue?
>>
>>
>> On Thu, Jun 5, 2014 at 8:42 AM, Shane McEwan <shane at mcewan.id.au> wrote:
>>
>>> On 05/06/14 16:20, Alain Rodriguez wrote:
>>> > Hi all,
>>> >
>>> > I upgraded 1 of 9 riak nodes in a cluster last night from 1.4.0 to
>>> > 1.4.9. The rest are running 1.4.0.
>>> >
>>> > Ever since I am seeing the upgraded node, riak01 consuming a
>>> > significantly larger percent of CPU and the PUT times on it have gotten
>>> > worse. htop indicicates one particular process pegging the CPU, and
>>> many
>>> > many more processes running than I was used to seeing before.
>>>
>>> G'day!
>>>
>>> Did you turn off and remove the Active Anti Entropy files before
>>> upgrading?
>>>
>>> From the 1.4.8 release notes:
>>>
>>> IMPORTANT We recommend removing current AAE trees before upgrading. That
>>> is, all files under the anti_entropy sub-directory. This will avoid
>>> potentially large amounts of repair activity once correct hashes start
>>> being added. The data in the current trees can only be fixed by a full
>>> rebuild, so this repair activity is wasteful. Trees will start to build
>>> once AAE is re-enabled. To minimize the impact of this, we recommend
>>> upgrading during a period of low activity.
>>>
>>> Shane.
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140605/08fbeb27/attachment.html>


More information about the riak-users mailing list