Reg:Continuous Periodic crashes after long operation

Shaun McVey smcvey at basho.com
Thu Jan 26 07:13:33 EST 2017


Hi Steven,

Based on that log output, it looks like you're running into issues with
system limits, probably open file limits.  You can check the value that
Riak has available by connecting to one of the nodes with riak attach, then
executing:

```
os:cmd("ulimit -n").
```

(After, disconnect with ctrl+g, then q, then Enter).

It should be at least 65,536 ideally, although the bigger the better.

If you find it's lower, then follow this doc to increase it.

http://docs.basho.com/riak/kv/2.0.2/using/performance/open-files-limit/

Have a check and let us know what the output was.

Kind Regards,
Shaun

On Thu, Jan 26, 2017 at 10:34 AM, Steven Joseph <steven at streethawk.com>
wrote:

> Hi,
>
> We have a cluster of 5 nodes, which are continuously being queried for
> new data through solr. We have been having some issues with riak/solr
> which seems to be happening after longer periods of operation. It starts
> off with one node and it seems to be happening on all node after a
> while.
>
> We tried upgrading to the latest version of riak hoping that it would
> solve the issue, but no luck.
>
> Only thing that stops the crashes is a full cluster staggered restart.
>
> Please find the logs below. Any help would be much appreciated.
>
> Riak Logs:
>
> 2017-01-26T07:53:03.262Z hawk5| ** Last message in was tick
> 2017-01-26T07:53:10.197Z hawk5|
> 2017-01-26T07:53:10.197Z hawk5| 2017-01-26 07:53:08.183 [error] emulator
> Error in process <0.22701.73> on node 'riak at hawk5.streethawk.com' with
> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g
> et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}
> ]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:10.263Z hawk5| Error in process <0.22701.73> on node '
> riak at hawk5.streethawk.com' with exit value: {{badmatch,{error,system_
> limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e
> rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{
> file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:10.263Z hawk5| 2017-01-26 07:53:08 =ERROR REPORT====
> 2017-01-26T07:53:17.198Z hawk5|
> 2017-01-26T07:53:17.208Z hawk5| 2017-01-26 07:53:13.472 [error] emulator
> Error in process <0.12549.73> on node 'riak at hawk5.streethawk.com' with
> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g
> et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}
> ]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:17.263Z hawk5| Error in process <0.12549.73> on node '
> riak at hawk5.streethawk.com' with exit value: {{badmatch,{error,system_
> limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e
> rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{
> file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:17.263Z hawk5| 2017-01-26 07:53:13 =ERROR REPORT====
> 2017-01-26T07:53:18.198Z hawk5| 2017-01-26 07:53:17.861 [error] emulator
> Error in process <0.2254.73> on node 'riak at hawk5.streethawk.com' with
> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$
> t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{
> cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:18.208Z hawk5|
> 2017-01-26T07:53:18.208Z hawk5| 2017-01-26 07:53:17.861 [error] emulator
> Error in process <0.2254.73> on node 'riak at hawk5.streethawk.com' with
> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$
> t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{
> cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:18.264Z hawk5|
>
>
> Python client traces:
>
> 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/
> dist-packages/riak/client/transport.py", line 179, in wrapper
> 2017-01-26T10:20:44.517Z hawk5| return self._client.fulltext_search(search_index,
> query, **params)
> 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/dist-packages/riak/bucket.py",
> line 476, in search
> 2017-01-26T10:20:44.517Z hawk5| raise e.args[0]
> 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/
> dist-packages/riak/client/transport.py", line 134, in _with_retries
> 2017-01-26T10:20:44.517Z hawk5| return self._with_retries(pool, thunk)
> 2017-01-26T10:20:44.543Z hawk5| RiakError: 'recv_into returned zero bytes
> unexpectedly'
>
>
> Regards
>
> Steven Joseph
>
> CTO, StreetHawk Pty Ltd
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170126/ed0fbf6e/attachment-0002.html>


More information about the riak-users mailing list