Reg:Continuous Periodic crashes after long operation

Steven Joseph steven at streethawk.co
Thu Jan 26 07:31:47 EST 2017


Hi Shaun,

I have already set this to a very high value

(riak at hawk1.streethawk.com)1> os:cmd("ulimit -n").
"20000500\n"
(riak at hawk1.streethawk.com)2>


So the issue is not that the limit is low, but maybe a resource leak ? As I
mentioned our application processes continuously run queries on the cluster.

Kind Regards

Steven

On Thu, Jan 26, 2017 at 11:13 PM Shaun McVey <smcvey at basho.com> wrote:

> Hi Steven,
>
> Based on that log output, it looks like you're running into issues with
> system limits, probably open file limits.  You can check the value that
> Riak has available by connecting to one of the nodes with riak attach, then
> executing:
>
> ```
> os:cmd("ulimit -n").
> ```
>
> (After, disconnect with ctrl+g, then q, then Enter).
>
> It should be at least 65,536 ideally, although the bigger the better.
>
> If you find it's lower, then follow this doc to increase it.
>
> http://docs.basho.com/riak/kv/2.0.2/using/performance/open-files-limit/
>
> Have a check and let us know what the output was.
>
> Kind Regards,
> Shaun
>
> On Thu, Jan 26, 2017 at 10:34 AM, Steven Joseph <steven at streethawk.com>
> wrote:
>
> Hi,
>
> We have a cluster of 5 nodes, which are continuously being queried for
> new data through solr. We have been having some issues with riak/solr
> which seems to be happening after longer periods of operation. It starts
> off with one node and it seems to be happening on all node after a
> while.
>
> We tried upgrading to the latest version of riak hoping that it would
> solve the issue, but no luck.
>
> Only thing that stops the crashes is a full cluster staggered restart.
>
> Please find the logs below. Any help would be much appreciated.
>
> Riak Logs:
>
> 2017-01-26T07:53:03.262Z hawk5| ** Last message in was tick
> 2017-01-26T07:53:10.197Z hawk5|
> 2017-01-26T07:53:10.197Z hawk5| 2017-01-26 07:53:08.183 [error] emulator
> Error in process <0.22701.73> on node 'riak at hawk5.streethawk.com' with
> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g
>
> et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:10.263Z hawk5| Error in process <0.22701.73> on node '
> riak at hawk5.streethawk.com' with exit value:
> {{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e
>
> rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:10.263Z hawk5| 2017-01-26 07:53:08 =ERROR REPORT====
> 2017-01-26T07:53:17.198Z hawk5|
> 2017-01-26T07:53:17.208Z hawk5| 2017-01-26 07:53:13.472 [error] emulator
> Error in process <0.12549.73> on node 'riak at hawk5.streethawk.com' with
> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g
>
> et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:17.263Z hawk5| Error in process <0.12549.73> on node '
> riak at hawk5.streethawk.com' with exit value:
> {{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e
>
> rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:17.263Z hawk5| 2017-01-26 07:53:13 =ERROR REPORT====
> 2017-01-26T07:53:18.198Z hawk5| 2017-01-26 07:53:17.861 [error] emulator
> Error in process <0.2254.73> on node 'riak at hawk5.streethawk.com' with
> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$
>
> t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:18.208Z hawk5|
> 2017-01-26T07:53:18.208Z hawk5| 2017-01-26 07:53:17.861 [error] emulator
> Error in process <0.2254.73> on node 'riak at hawk5.streethawk.com' with
> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$
>
> t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]}
> 2017-01-26T07:53:18.264Z hawk5|
>
>
> Python client traces:
>
> 2017-01-26T10:20:44.517Z hawk5| File
> "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py", line
> 179, in wrapper
> 2017-01-26T10:20:44.517Z hawk5| return
> self._client.fulltext_search(search_index, query, **params)
> 2017-01-26T10:20:44.517Z hawk5| File
> "/usr/local/lib/python2.7/dist-packages/riak/bucket.py", line 476, in search
> 2017-01-26T10:20:44.517Z hawk5| raise e.args[0]
> 2017-01-26T10:20:44.517Z hawk5| File
> "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py", line
> 134, in _with_retries
> 2017-01-26T10:20:44.517Z hawk5| return self._with_retries(pool, thunk)
> 2017-01-26T10:20:44.543Z hawk5| RiakError: 'recv_into returned zero bytes
> unexpectedly'
>
>
> Regards
>
> Steven Joseph
>
> CTO, StreetHawk Pty Ltd
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170126/899c6382/attachment-0002.html>


More information about the riak-users mailing list