Riak performance problems when LevelDB database grows beyond 16GB

Eli Janssen elij.mx at gmail.com
Mon Oct 15 18:09:20 EDT 2012


Anything in the system logs or dmesg?
With vm.swappiness set to the defaults, the oom-killer could be doing its job a bit too well.


On Oct 15, 2012, at 12:10 PM, Jan.Evangelista at seznam.cz wrote:

> Hi Evan,
> 
> regarding the swappiness and disk scheduling: these were set to default, I will correct it and run another test. 
> 
> The hosting provider sets the computer with software RAID1 over 2 physical disks, do you think it is useful with Riak?
> 
> BTW, I suspected that part of the problem could be caused by the hardware of the first node. So I ran another test over the weekend with the node replaced, and the result was slightly better: one of the nodes crashed after cca 22 hours when its DB reached 14G, but the other 3 nodes worked for 2.8 days until the DB reached 40G (see http://janevangelista.rajce.idnes.cz/nastenka/#4Riak_2K_2.1RC2_3d_edited.jpg ). All the nodes crashed silently, there is nothing interesting in Riak logs.
> 
> Thanks, Jan
> 
> ---------- Původní zpráva ----------
> Od: Evan Vigil-McClanahan 
> Datum: 12. 10. 2012
> Předmět: Re: Re: Riak performance problems when LevelDB database grows beyond 16GB
> Hi there, Jan,
> 
> The lsof issue is that max_open_files is per backend, iirc, so if
> you're maxed out you'll see vnode count * max_open_files.
> 
> I think on the second try, you may have set the cache too high.   I'd
> drop it back to 8 or 16 MB, and possibly up the open files a bit more,
> but you don't seem to be running into contention at this point.
> There's a RAM cost, so maybe just leave it where it is for now, unless
> you have quite a lot of memory.
> 
> Another thing to check is that vm.swappiness is set to 0 and that your
> disk scheduler is set to deadline for spinning disks and noop for
> SSDs.
> 
> On Fri, Oct 12, 2012 at 5:02 AM,   wrote:
>>> Can you attach the eleveldb portion of your app.config file?
>>> Configuration problems, especially max_open_files being too low, can
>>> often cause issues like this.
>>> 
>>> If it isn't sensitive, the whole app.config and vm.args files are also
>>> often helpful.
>> 
>> Hello Evan,
>> 
>> thanks for responding.
>> 
>> I originally had default LevelDB settings. When the node stalled, I changed it
>> to
>> 
>> {eleveldb, [
>>             {data_root, "/home/riak/leveldb"},
>>             {max_open_files, 132},
>>             {cache_size, 377487360}
>>            ]},
>> 
>> on all nodes and I restarted them all. The application started to run with
>> about 1000 requests/second, after about 1 minute it dropped to <500
>> requests/second, and the node stalled again after 41 minutes. BTW according to
>> lsof(1) it had 267 open LevelDB files which is more than the 132 files limit
>> (??).
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list