Recovering Riak data if it can no longer load in memory

Vikram Lalit vikramlalit at gmail.com
Wed Jul 13 14:22:41 EDT 2016


Many thanks again, Matthew... this is very educative and helpful...! I'm
now giving a shot with higher performance nodes...

Thanks again!


On Tue, Jul 12, 2016 at 4:18 PM, Matthew Von-Maszewski <matthewv at basho.com>
wrote:

> You can further reduce memory used by leveldb with the following setting
> in riak.conf:
>
>     leveldb.threads = 5
>
> The value "5" needs to be a prime number.  The system defaults to 71.
> Many Linux implementations will allocate 8Mbytes per thread for stack.  So
> bunches of threads lead to bunches of memory reserved for stack.  That is
> fine on servers with higher memory.  But probably part of your problem on a
> small memory machine.
>
> The thread count is high to promote parallelism across vnodes on the same
> server, especially with "entropy = active".  So again, this setting is
> sacrificing performance to save memory.
>
> Matthew
>
> P.S.  You really want 8 CPU cores, 4 as a dirt minimum.  And review this
> for more cpu performance info:
>
>     https://github.com/basho/leveldb/wiki/riak-tuning-2
>
>
>
> On Jul 12, 2016, at 4:04 PM, Vikram Lalit <vikramlalit at gmail.com> wrote:
>
> Thanks much Matthew. Yes the server is low-memory given only development
> right now - I'm using an AWS micro instance, so 1 GB RAM and 1 vCPU.
>
> Thanks for the tip - let me try move the manifest file to a larger
> instance and see how that works. More than reducing the memory footprint in
> dev, my concern was more around reacting to a possible production scenario
> where the db stops responding due to memory overload. Understood now that
> moving to a larger instance should be possible. Thanks again.
>
> On Tue, Jul 12, 2016 at 12:26 PM, Matthew Von-Maszewski <
> matthewv at basho.com> wrote:
>
>> It would be helpful if you described the physical characteristics of the
>> servers:  memory size, logical cpu count, etc.
>>
>> Google created leveldb to be highly reliable in the face of crashes.  If
>> it is not restarting, that suggests to me that you have a low memory
>> condition that is not able to load leveldb's MANIFEST file.  That is easily
>> fixed by moving the dataset to a machine with larger memory.
>>
>> There is also a special flag to reduce Riak's leveldb memory foot print
>> during development work.  The setting reduces the leveldb performance, but
>> lets you run with less memory.
>>
>> In riak.conf, set:
>>
>> leveldb.limited_developer_mem = true
>>
>> Matthew
>>
>>
>> > On Jul 12, 2016, at 11:56 AM, Vikram Lalit <vikramlalit at gmail.com>
>> wrote:
>> >
>> > Hi - I've been testing a Riak cluster (of 3 nodes) with an ejabberd
>> messaging cluster in front of it that writes data to the Riak nodes. Whilst
>> load testing the platform (by creating 0.5 million ejabberd users via
>> Tsung), I found that the Riak nodes suddenly crashed. My question is how do
>> we recover from such a situation if it were to occur in production?
>> >
>> > To provide further context / details, the leveldb log files storing the
>> data suddenly became too huge, thus making the AWS Riak instances not able
>> to load them in memory anymore. So we get a core dump if 'riak start' is
>> fired on those instances. I had an n_val = 2, and all 3 nodes went down
>> almost simultaneously, so in such a scenario, we cannot even rely on a 2nd
>> copy of the data. One way to of course prevent it in the first place would
>> be to use auto-scaling, but I'm wondering is there a ex post facto / post
>> the event recovery that can be performed in such a scenario? Is it possible
>> to simply copy the leveldb data to a larger memory instance, or to curtail
>> the data further to allow loading in the same instance?
>> >
>> > Appreciate if you can provide inputs - a tad concerned as to how we
>> could recover from such a situation if it were to happen in production
>> (apart from leveraging auto-scaling as a preventive measure).
>> >
>> > Thanks!
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users at lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160713/e418e2f0/attachment-0002.html>


More information about the riak-users mailing list