Having to raise VM number-of-processes limit

Evan Vigil-McClanahan emcclanahan at basho.com
Tue Apr 2 10:47:31 EDT 2013


If your n_val is still three, then three sad nodes is a suspicious
number. My first guess would be a very large value being put in and
other requests backing up behind it.  That would explain the
health-check failures (especially if you're normally doing a lot of
small/fast reads and writes).

However, even that explanation doesn't get us anywhere near 500000
processes.  It'd be really nice to see that top output.  Maybe leave
it running and spooling to a file to see if you can capture the
output?  What does a frame of it look like now, without the problem
happening?


On Tue, Apr 2, 2013 at 7:31 AM, Dave Brady <dbrady at weborama.com> wrote:
> It happened again today, though I was not available to watch it at the time.
>
> Three nodes each showed riak_kv being stopped for one minute:
>
> 2013-04-02 11:10:57.923 [info] <0.2833.1447>@riak_kv_app:check_kv_health:239 Disabling riak_kv due to large message queues. Offending vnodes: [{319703483166135013357056057156686910549735243776,5798}]
> 2013-04-02 11:11:57.924 [info] <0.3589.1447>@riak_kv_app:check_kv_health:242 Re-enabling riak_kv after successful health check
>
> --
> Dave Brady
>
> ----- Original Message -----
> From: "Dave Brady" <dbrady at weborama.com>
> To: "Evan Vigil-McClanahan" <emcclanahan at basho.com>
> Cc: riak-users at lists.basho.com
> Sent: Monday, April 1, 2013 11:15:47 AM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna
> Subject: Re: Having to raise VM number-of-processes limit
>
> Hi Evan,
>
> Thanks for the suggestions!
>
> I did not think that raising that limit was normal.  Glad to have confirmation.
>
> I'll go through the logs again, and run 'riak-admin top ...' the next time it happens.
>
> --
> Dave Brady
>
> ----- Original Message -----
> From: "Evan Vigil-McClanahan" <emcclanahan at basho.com>
> To: "Dave Brady" <dbrady at weborama.com>
> Cc: riak-users at lists.basho.com
> Sent: Saturday, March 30, 2013 11:03:30 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna
> Subject: Re: Having to raise VM number-of-processes limit
>
> Dave,
>
> If you're seeing the process count go that high, it suggests to me
> that something else is wrong.  Typically, even for heavily loaded
> clusters, hundreds of thousands of processes isn't normal.  Is there
> anything else in the logs?
>
> When a node sees this sort of behavior start, does riak-admin top
> -sort msg_q look like?
>
> On Sat, Mar 30, 2013 at 2:07 PM, Dave Brady <dbrady at weborama.com> wrote:
>> Hello,
>>
>> I have run into a situation whereby I started seeing:
>>
>> [error] emulator Too many processes
>>
>> when some of our new jobs ran.  These jobs are in perl using Net::Riak,
>> communicating to the cluster via PBC.  They fire tens of thousands of fetchs
>> and stores over the course of about 20 minutes.
>>
>> Our cluster has five nodes with 1.3, using eLevelDB.
>>
>> I have been raising the limit (+P in vm.args) in increments from the default
>> of 32768.  Currently at 524288, and that is still not high enough.
>>
>> Have any of you had to increase this limit?
>>
>> Thanks!
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list