riak exiting when number partitions > 128

Jeremiah Peschka jeremiah.peschka at gmail.com
Mon Oct 8 22:14:40 EDT 2012


The behavior of su and sudo are at best entertaining and at worst entirely
maddening. Now that you've worked your way through this, hopefully I'll
remember it at some point in the future.

---
Jeremiah Peschka
Managing Director, Brent Ozar PLF, LLC


On Mon, Oct 8, 2012 at 7:12 PM, David Lowell <dave at go2ctv.com> wrote:

> I just spotted the doc:
>
> http://wiki.basho.com/Open-Files-Limit.html
>
> which has a suggested a better approach for configuring the file limit
> than setting it in /etc/security/limits.d. I've created the
> /etc/default/riak file containing the revised file limit, and that works
> well with the stock 'su' behavior.
>
> Thanks everyone for puzzling this through with me.
>
> Dave
>
> --
> Dave Lowell
> dave at connectv.com
>
> On Oct 8, 2012, at 7:02 PM, David Lowell wrote:
>
> Thanks guys. I had raised the max open files limit at the system level for
> the 'riak' user to 100k, and confirmed that had taken affect with "sudo -u
> riak bash -c 'ulimit -a'". However, it appears that you are correct that
> this system setting is not affecting the actual startup of riak when I run
> riak's /etc/init.d/riak init script.
>
> Digging further, reveals that 'su' seems not to be helping. Witness:
>
> $ sudo -u riak bash -c 'ulimit -n'
> 100000
> $ sudo su - riak -c 'ulimit -n'
> 1024
>
> I've seen weirdness like this with su in the past, and have been phasing
> it out of my vocabulary.
>
> So, to confirm this was the issue I tweaked riak's init script, replacing
>
>    su - riak -c "$DAEMON $DAEMON_ARGS" || return 2
>
> with
>
>   sudo -u riak $DAEMON $DAEMON_ARGS || return 2
>
> And now riak starts up properly with 512 partitions.
>
> So now the question becomes: is there some way to get "su" to behave more
> like "sudo" in this case? Or do we just need to use a custom init script
> until the stock init script evolves past su?
>
> Dave
>
> --
> Dave Lowell
> dave at connectv.com
>
> On Oct 8, 2012, at 6:21 PM, Jeremiah Peschka wrote:
>
> Like Alex, I state ulimit -n directly in my start Riak start up scripts.
> For my local dev instance it looks like this:
>
> ### Generic Riak dev version setup
> function riak_dev_start() {
>   local CURRENT=`pwd`;
>
>   ulimit -n 1024;
>
>   cd ~/Projects/riak/dev
>   echo "Starting riak node 1 on 127.0.0.1"
>   dev1/bin/riak start
>   echo "Starting riak node 2 on 127.0.0.1"
>   dev2/bin/riak start
>   echo "Starting riak node 3 on 127.0.0.1"
>   dev3/bin/riak start
>   echo "Starting riak node 4 on 127.0.0.1"
>   dev4/bin/riak start
>
>   cd $CURRENT
> }
>
> I'd definitely try bumping ulimit in your riak startup scripts themselves
> and see if that eliminates the issues that you're running into.
> ---
> Jeremiah Peschka
> Managing Director, Brent Ozar PLF, LLC
>
>
> On Mon, Oct 8, 2012 at 6:11 PM, Alexander Sicular <siculars at gmail.com>wrote:
>
>> I don't think you're setting it correctly. I usually set it in the
>> terminal before calling riak start. Or set it system wide, different ways
>> to do it depending on your os.
>>
>>
>> @siculars
>> http://siculars.posterous.com
>>
>> Sent from my iRotaryPhone
>>
>> On Oct 8, 2012, at 21:00, David Lowell <dave at go2ctv.com> wrote:
>>
>> I'm starting to want to move past the default Riak configs, for example,
>> by running with a larger number of partitions than the default 64. However,
>> today when bumping up the "ring_creation_size" config param to 256 or
>> higher Riak started failing soon after startup with messages about "Too
>> many open files". For the record, I'm using the ELevelDB back-end.
>>
>> I've seen the documentation about the need for ring_creation_size *
>> max_open_files file descriptors with levelDB. I've upped the system open
>> files limit for the riak user to 100k, so I don't think I'm hitting that
>> system limit. So it feels like I'm hitting a limit configured within the
>> application somewhere.
>>
>> It doesn't feel like changing levelDB's 'max_open_files' configuration is
>> the issue here, as I'm using the default/minimum value of 20 for that
>> parameter. Any other setting would increase open files.
>>
>> So I could use a pointer here from folks who have been here. I suspect
>> there is something very simple required here.
>>
>> Thanks folks!
>>
>> Dave
>>
>> ps. For the record, my data set is empty on this host, and for
>> completeness I'm blowing away the ring state when I fiddle with the
>> ring_creation_size parameter.
>>
>> --
>> Dave Lowell
>> dave at connectv.com
>>
>>
>> 2012-10-09 00:50:17.430 [info] <0.7.0> Application riak_kv started on
>> node 'riak at 10.0.3.81'
>> 2012-10-09 00:50:17.456 [info] <0.7.0> Application merge_index started on
>> node 'riak at 10.0.3.81'
>> 2012-10-09 00:50:17.459 [info] <0.1316.0>@riak_core:wait_for_service:445
>> Waiting for service riak_kv to start (0 seconds)
>> 2012-10-09 00:50:17.525 [info]
>> <0.1303.0>@riak_core:wait_for_application:419 Wait complete for application
>> riak_kv (0 seconds)
>> 2012-10-09 00:50:37.366 [error] <0.5081.0>@riak_kv_vnode:init:265 Failed
>> to start riak_kv_eleveldb_backend Reason: {db_open,"IO error:
>> /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK:
>> Too many open files"}
>> 2012-10-09 00:50:37.423 [notice] <0.5081.0>@riak:stop:46 "backend module
>> failed to start."
>> 2012-10-09 00:50:37.424 [error] <0.5081.0> CRASH REPORT Process
>> <0.5081.0> with 0 neighbours exited with reason: {db_open,"IO error:
>> /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK:
>> Too many open files"} in gen_fsm:init_it/6 line 371
>> 2012-10-09 00:50:37.429 [info] <0.494.0>@riak_kv_js_vm:terminate:240
>> Spidermonkey VM (pool: riak_kv_js_hook) host stopping (<0.494.0>)
>> 2012-10-09 00:50:37.673 [error] <0.138.0> Supervisor riak_core_vnode_sup
>> had child undefined started with {riak_core_vnode,start_link,undefined} at
>> <0.5081.0> exit with reason {db_open,"IO error:
>> /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK:
>> Too many open files"} in context child_terminated
>> 2012-10-09 00:50:37.736 [error] <0.153.0> gen_server
>> riak_core_vnode_manager terminated with reason: no match of right hand
>> value {error,{db_open,"IO error:
>> /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK:
>> Too many open files"}} in riak_core_vnode_manager:get_vnode/3 line 489
>> 2012-10-09 00:50:37.799 [error] <0.153.0> CRASH REPORT Process
>> riak_core_vnode_manager with 0 neighbours exited with reason: no match of
>> right hand value {error,{db_open,"IO error:
>> /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK:
>> Too many open files"}} in riak_core_vnode_manager:get_vnode/3 line 489 in
>> gen_server:terminate/6 line 747
>> 2012-10-09 00:50:37.844 [error] <0.136.0> Supervisor riak_core_sup had
>> child riak_core_vnode_manager started with
>> riak_core_vnode_manager:start_link() at <0.153.0> exit with reason no match
>> of right hand value {error,{db_open,"IO error:
>> /var/data/ctv/riak/leveldb/1427247692705959881058285969449495136382746624000/LOCK:
>> Too many open files"}} in riak_core_vnode_manager:get_vnode/3 line 489 in
>> context child_terminated
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121008/6e6867c4/attachment.html>


More information about the riak-users mailing list