Reg:Continuous Periodic crashes after long operation

Steven Joseph steven at streethawk.com
Tue Jan 31 15:22:19 EST 2017


Hi Shaun,

Im having this issue again, this time I have captured the system limits,
while riak is still crashing.

Please note lsof and prlimit outputs at bottom.


steven at hawk5:log/riak:» tail error.log                                                                                                                                                                                            [0]  07:17:05

2017-01-31 19:21:37.391 [error] emulator Error in process <0.7964.15> on node 'riak at hawk5.streethawk.com' with exit value: {{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}

2017-01-31 19:21:40.868 [error] <0.25635.14> gen_server yz_cover terminated with reason: no match of right hand value error in mochiglobal:compile/2 line 51
2017-01-31 19:21:40.868 [error] <0.25635.14> CRASH REPORT Process yz_cover with 0 neighbours exited with reason: no match of right hand value error in mochiglobal:compile/2 line 51 in gen_server:terminate/6 line 744
2017-01-31 19:21:40.868 [error] <0.1215.0> Supervisor yz_general_sup had child yz_cover started with yz_cover:start_link() at <0.25635.14> exit with reason no match of right hand value error in mochiglobal:compile/2 line 51 in context child_terminated
2017-01-31 19:21:41.811 [error] emulator Error in process <0.18111.15> on node 'riak at hawk5.streethawk.com' with exit value: {{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}

2017-01-31 19:21:47.363 [error] emulator Error in process <0.2866.15> on node 'riak at hawk5.streethawk.com' with exit value: {{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}

steven at hawk5:log/riak:» sudo lsof -a -p `riak getpid` |wc -l                                                                                                                                                                      [0]  07:17:10
48446
steven at hawk5:log/riak:» sudo prlimit -n --noheadings -o soft -p `riak getpid`                                                                                                                                                     [0]  07:17:27
20000500
steven at hawk5:log/riak:» sudo prlimit -n --noheadings -o hard -p `riak getpid`                                                                                                                                                     [0]  07:17:32
20000500
steven at hawk5:log/riak:»


Python trace:

2017-01-31T20:20:52.004Z hawk4| return self._client.fulltext_search(search_index, query, **params)
2017-01-31T20:20:52.004Z hawk4| **skwargs
2017-01-31T20:20:52.004Z hawk4| return self._with_retries(pool, thunk)
2017-01-31T20:20:52.004Z hawk4| **kwargs
2017-01-31T20:20:52.004Z hawk4| File "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py", line 179, in wrapper
2017-01-31T20:20:52.004Z hawk4| File "/usr/local/lib/python2.7/dist-packages/riak/bucket.py", line 476, in search
2017-01-31T20:20:52.004Z hawk4| File "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py", line 134, in _with_retries
2017-01-31T20:20:52.004Z hawk4| File "/opt/streethawk/cloud/core/riakdb/models.py", line 528, in search
2017-01-31T20:20:52.005Z hawk4| RiakError: 'recv_into returned zero bytes unexpectedly'
2017-01-31T20:20:52.005Z hawk4| raise e.args[0]



Regards

Steven


Shaun McVey <smcvey at basho.com> writes:

> Hi Steven,
>
> Based on that log output, it looks like you're running into issues with
> system limits, probably open file limits.  You can check the value that
> Riak has available by connecting to one of the nodes with riak attach, then
> executing:
>
> ```
> os:cmd("ulimit -n").
> ```
>
> (After, disconnect with ctrl+g, then q, then Enter).
>
> It should be at least 65,536 ideally, although the bigger the better.
>
> If you find it's lower, then follow this doc to increase it.
>
> http://docs.basho.com/riak/kv/2.0.2/using/performance/open-files-limit/
>
> Have a check and let us know what the output was.
>
> Kind Regards,
> Shaun
>
> On Thu, Jan 26, 2017 at 10:34 AM, Steven Joseph <steven at streethawk.com>
> wrote:
>
>> Hi,
>>
>> We have a cluster of 5 nodes, which are continuously being queried for
>> new data through solr. We have been having some issues with riak/solr
>> which seems to be happening after longer periods of operation. It starts
>> off with one node and it seems to be happening on all node after a
>> while.
>>
>> We tried upgrading to the latest version of riak hoping that it would
>> solve the issue, but no luck.
>>
>> Only thing that stops the crashes is a full cluster staggered restart.
>>
>> Please find the logs below. Any help would be much appreciated.
>>
>> Riak Logs:
>>
>> 2017-01-26T07:53:03.262Z hawk5| ** Last message in was tick
>> 2017-01-26T07:53:10.197Z hawk5|
>> 2017-01-26T07:53:10.197Z hawk5| 2017-01-26 07:53:08.183 [error] emulator
>> Error in process <0.22701.73> on node 'riak at hawk5.streethawk.com' with
>> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g
>> et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}
>> ]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
>> 2017-01-26T07:53:10.263Z hawk5| Error in process <0.22701.73> on node '
>> riak at hawk5.streethawk.com' with exit value: {{badmatch,{error,system_
>> limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e
>> rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{
>> file,"cpu_sup.erl"},{line,585}]}]}
>> 2017-01-26T07:53:10.263Z hawk5| 2017-01-26 07:53:08 =ERROR REPORT====
>> 2017-01-26T07:53:17.198Z hawk5|
>> 2017-01-26T07:53:17.208Z hawk5| 2017-01-26 07:53:13.472 [error] emulator
>> Error in process <0.12549.73> on node 'riak at hawk5.streethawk.com' with
>> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g
>> et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}
>> ]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
>> 2017-01-26T07:53:17.263Z hawk5| Error in process <0.12549.73> on node '
>> riak at hawk5.streethawk.com' with exit value: {{badmatch,{error,system_
>> limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e
>> rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{
>> file,"cpu_sup.erl"},{line,585}]}]}
>> 2017-01-26T07:53:17.263Z hawk5| 2017-01-26 07:53:13 =ERROR REPORT====
>> 2017-01-26T07:53:18.198Z hawk5| 2017-01-26 07:53:17.861 [error] emulator
>> Error in process <0.2254.73> on node 'riak at hawk5.streethawk.com' with
>> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$
>> t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{
>> cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
>> 2017-01-26T07:53:18.208Z hawk5|
>> 2017-01-26T07:53:18.208Z hawk5| 2017-01-26 07:53:17.861 [error] emulator
>> Error in process <0.2254.73> on node 'riak at hawk5.streethawk.com' with
>> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$
>> t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{
>> cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
>> 2017-01-26T07:53:18.264Z hawk5|
>>
>>
>> Python client traces:
>>
>> 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/
>> dist-packages/riak/client/transport.py", line 179, in wrapper
>> 2017-01-26T10:20:44.517Z hawk5| return self._client.fulltext_search(search_index,
>> query, **params)
>> 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/dist-packages/riak/bucket.py",
>> line 476, in search
>> 2017-01-26T10:20:44.517Z hawk5| raise e.args[0]
>> 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/
>> dist-packages/riak/client/transport.py", line 134, in _with_retries
>> 2017-01-26T10:20:44.517Z hawk5| return self._with_retries(pool, thunk)
>> 2017-01-26T10:20:44.543Z hawk5| RiakError: 'recv_into returned zero bytes
>> unexpectedly'
>>
>>
>> Regards
>>
>> Steven Joseph
>>
>> CTO, StreetHawk Pty Ltd
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>




More information about the riak-users mailing list