riak exiting when number partitions > 128

David Lowell dave at go2ctv.com
Mon Oct 8 22:12:59 EDT 2012


I just spotted the doc:

http://wiki.basho.com/Open-Files-Limit.html

which has a suggested a better approach for configuring the file limit than setting it in /etc/security/limits.d. I've created the /etc/default/riak file containing the revised file limit, and that works well with the stock 'su' behavior.

Thanks everyone for puzzling this through with me.

Dave

--
Dave Lowell
dave at connectv.com

On Oct 8, 2012, at 7:02 PM, David Lowell wrote:

> Thanks guys. I had raised the max open files limit at the system level for the 'riak' user to 100k, and confirmed that had taken affect with "sudo -u riak bash -c 'ulimit -a'". However, it appears that you are correct that this system setting is not affecting the actual startup of riak when I run riak's /etc/init.d/riak init script.
> 
> Digging further, reveals that 'su' seems not to be helping. Witness:
> 
> $ sudo -u riak bash -c 'ulimit -n'
> 100000
> $ sudo su - riak -c 'ulimit -n'
> 1024
> 
> I've seen weirdness like this with su in the past, and have been phasing it out of my vocabulary.
> 
> So, to confirm this was the issue I tweaked riak's init script, replacing
> 
>    su - riak -c "$DAEMON $DAEMON_ARGS" || return 2
>     
> with
> 
>   sudo -u riak $DAEMON $DAEMON_ARGS || return 2 
> 
> And now riak starts up properly with 512 partitions.
> 
> So now the question becomes: is there some way to get "su" to behave more like "sudo" in this case? Or do we just need to use a custom init script until the stock init script evolves past su?
> 
> Dave
> 
> --
> Dave Lowell
> dave at connectv.com
> 
> On Oct 8, 2012, at 6:21 PM, Jeremiah Peschka wrote:
> 
>> Like Alex, I state ulimit -n directly in my start Riak start up scripts. For my local dev instance it looks like this:
>> 
>> ### Generic Riak dev version setup
>> function riak_dev_start() {
>>   local CURRENT=`pwd`;
>>   
>>   ulimit -n 1024;
>> 
>>   cd ~/Projects/riak/dev
>>   echo "Starting riak node 1 on 127.0.0.1"
>>   dev1/bin/riak start
>>   echo "Starting riak node 2 on 127.0.0.1"
>>   dev2/bin/riak start
>>   echo "Starting riak node 3 on 127.0.0.1"
>>   dev3/bin/riak start
>>   echo "Starting riak node 4 on 127.0.0.1"
>>   dev4/bin/riak start
>>   
>>   cd $CURRENT
>> }
>> 
>> I'd definitely try bumping ulimit in your riak startup scripts themselves and see if that eliminates the issues that you're running into.
>> ---
>> Jeremiah Peschka
>> Managing Director, Brent Ozar PLF, LLC
>> 
>> 
>> On Mon, Oct 8, 2012 at 6:11 PM, Alexander Sicular <siculars at gmail.com> wrote:
>> I don't think you're setting it correctly. I usually set it in the terminal before calling riak start. Or set it system wide, different ways to do it depending on your os. 
>> 
>> 
>> @siculars
>> http://siculars.posterous.com
>> 
>> Sent from my iRotaryPhone
>> 
>> On Oct 8, 2012, at 21:00, David Lowell <dave at go2ctv.com> wrote:
>> 
>>> I'm starting to want to move past the default Riak configs, for example, by running with a larger number of partitions than the default 64. However, today when bumping up the "ring_creation_size" config param to 256 or higher Riak started failing soon after startup with messages about "Too many open files". For the record, I'm using the ELevelDB back-end.
>>> 
>>> I've seen the documentation about the need for ring_creation_size * max_open_files file descriptors with levelDB. I've upped the system open files limit for the riak user to 100k, so I don't think I'm hitting that system limit. So it feels like I'm hitting a limit configured within the application somewhere.
>>> 
>>> It doesn't feel like changing levelDB's 'max_open_files' configuration is the issue here, as I'm using the default/minimum value of 20 for that parameter. Any other setting would increase open files.
>>> 
>>> So I could use a pointer here from folks who have been here. I suspect there is something very simple required here. 
>>> 
>>> Thanks folks!
>>> 
>>> Dave
>>> 
>>> ps. For the record, my data set is empty on this host, and for completeness I'm blowing away the ring state when I fiddle with the ring_creation_size parameter.
>>> 
>>> --
>>> Dave Lowell
>>> dave at connectv.com
>>> 
>>> 
>>> 2012-10-09 00:50:17.430 [info] <0.7.0> Application riak_kv started on node 'riak at 10.0.3.81'
>>> 2012-10-09 00:50:17.456 [info] <0.7.0> Application merge_index started on node 'riak at 10.0.3.81'
>>> 2012-10-09 00:50:17.459 [info] <0.1316.0>@riak_core:wait_for_service:445 Waiting for service riak_kv to start (0 seconds)
>>> 2012-10-09 00:50:17.525 [info] <0.1303.0>@riak_core:wait_for_application:419 Wait complete for application riak_kv (0 seconds)
>>> 2012-10-09 00:50:37.366 [error] <0.5081.0>@riak_kv_vnode:init:265 Failed to start riak_kv_eleveldb_backend Reason: {db_open,"IO error: /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK: Too many open files"}
>>> 2012-10-09 00:50:37.423 [notice] <0.5081.0>@riak:stop:46 "backend module failed to start."
>>> 2012-10-09 00:50:37.424 [error] <0.5081.0> CRASH REPORT Process <0.5081.0> with 0 neighbours exited with reason: {db_open,"IO error: /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK: Too many open files"} in gen_fsm:init_it/6 line 371
>>> 2012-10-09 00:50:37.429 [info] <0.494.0>@riak_kv_js_vm:terminate:240 Spidermonkey VM (pool: riak_kv_js_hook) host stopping (<0.494.0>)
>>> 2012-10-09 00:50:37.673 [error] <0.138.0> Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.5081.0> exit with reason {db_open,"IO error: /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK: Too many open files"} in context child_terminated
>>> 2012-10-09 00:50:37.736 [error] <0.153.0> gen_server riak_core_vnode_manager terminated with reason: no match of right hand value {error,{db_open,"IO error: /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK: Too many open files"}} in riak_core_vnode_manager:get_vnode/3 line 489
>>> 2012-10-09 00:50:37.799 [error] <0.153.0> CRASH REPORT Process riak_core_vnode_manager with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK: Too many open files"}} in riak_core_vnode_manager:get_vnode/3 line 489 in gen_server:terminate/6 line 747
>>> 2012-10-09 00:50:37.844 [error] <0.136.0> Supervisor riak_core_sup had child riak_core_vnode_manager started with riak_core_vnode_manager:start_link() at <0.153.0> exit with reason no match of right hand value {error,{db_open,"IO error: /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK: Too many open files"}} in riak_core_vnode_manager:get_vnode/3 line 489 in context child_terminated
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121008/d434faf1/attachment.html>


More information about the riak-users mailing list