RiakCS node crash +90% disk i/o

Alex Millar alex at gobonfire.com
Wed Dec 3 09:15:35 EST 2014


Good morning Riak-Users

Last night one of the nodes in my 5 node RiakCS cluster went haywire and shot up to +90% disk i/o utilization seemingly out of the blue.

Looking at the riak error.log I saw the following being continuously written.

2014-12-02 21:57:13.220 [error] <0.29210.3089> CRASH REPORT Process <0.29210.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/570899077082383952423314387779798054553098649600/CURRENT: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:13.226 [error] <0.29211.3089> CRASH REPORT Process <0.29211.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/776422744832042175295707567380525354192214163456/LOCK: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:13.226 [error] <0.29212.3089> CRASH REPORT Process <0.29212.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/570899077082383952423314387779798054553098649600/CURRENT: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:13.226 [error] <0.29213.3089> CRASH REPORT Process <0.29213.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/776422744832042175295707567380525354192214163456/CURRENT: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:13.286 [error] <0.29215.3089> CRASH REPORT Process <0.29215.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/776422744832042175295707567380525354192214163456/LOCK: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:13.286 [error] <0.29214.3089> CRASH REPORT Process <0.29214.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/570899077082383952423314387779798054553098649600/LOCK: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:13.286 [error] <0.29217.3089> CRASH REPORT Process <0.29217.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/570899077082383952423314387779798054553098649600/LOCK: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:13.287 [error] <0.29216.3089> CRASH REPORT Process <0.29216.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/776422744832042175295707567380525354192214163456/LOCK: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:13.312 [error] <0.29219.3089> CRASH REPORT Process <0.29219.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/570899077082383952423314387779798054553098649600/LOCK: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328
2014-12-02 21:57:15.634 [error] <0.29218.3089> CRASH REPORT Process <0.29218.3089> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/776422744832042175295707567380525354192214163456/CURRENT: Too many open files"}} in hashtree:new_segment_store/2 line 505 in gen_server:init_it/6 line 328

Leading up to this there didn’t appear to be any significant load on our cluster. 

I simply restarted the node and the issue went away but I wanted to reach out to get some help as to why / how this arose in the first place.

Regards,

             	Alex Millar, CTO  
Office: 1-800-354-8010 ext. 704  
Mobile: 519-729-2539  
GoBonfire.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20141203/c137b70a/attachment.html>


More information about the riak-users mailing list