riak 2.1.1 : Erlang crash dump

Russell Brown russell.brown at me.com
Sun Oct 4 03:00:01 EDT 2015


I doubt it is sibling explosion as 2.1.1 has DVV by default, and the config has 100 max siblings declared. But it may be, I guess.

Can you send me crash logs, or even the crash dump so I can get a better idea? I mean, it surely looks like a memory leak of some kind.

Do you use the “write once” bucket type, or default bucket type? What bucket properties on the keys you are writing? Any queries (list_keys? 2i?)

Erlang version? Built yourself, or the one shipped with riak? 4 cores but 60gb of ram, really, is this because it’s a VM? What does [frame-pointer] mean in the header output from erlang there in your first post, I’ve never seen that before?

Sorry for all the questions, but at the moment I think more information is the way to go. If you want to mail me logs off list, that is fine too.

Cheers

Russell

> On 4 Oct 2015, at 01:43, Matthew Von-Maszewski <matthewv at basho.com> wrote:
> 
> Girish,
> 
> This feels like a sibling explosion to me.  I cannot help prove or fix it.  Writing this paragraph as bait for others to help.
> 
> Matthew
> 
> Sent from my iPad
> 
> On Oct 3, 2015, at 8:34 PM, Girish Shankarraman <gshankarraman at vmware.com> wrote:
> 
>> Thank you for the response, Jon.
>> 
>> So I changed it to 50% and it crashed again.
>> I have a 5 nodes cluster with 60GB RAM on each node. Ring size is set to 64. (Attached riak conf if any one has some ideas).
>> 
>> I still see the erlang process consuming the entire capacity of the system (52 GB).
>> 
>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>> 24256 riak      20   0 67.134g 0.052t  18740 D   0.0 90.0   2772:44 beam.smp
>> 
>> ---- Cluster Status ----
>> Ring ready: true
>> 
>> +--------------------+------+-------+-----+-------+
>> |        node        |status| avail |ring |pending|
>> +--------------------+------+-------+-----+-------+
>> | (C) riak at 20.0.0.11 |valid |  up   | 20.3|  --   |
>> |     riak at 20.0.0.12 |valid |  up   | 20.3|  --   |
>> |     riak at 20.0.0.13 |valid |  up   | 20.3|  --   |
>> |     riak at 20.0.0.14 |valid |  up   | 20.3|  --   |
>> |     riak at 20.0.0.15 |valid |  up   | 18.8|  --   |
>> 
>> Thanks,
>> 
>> — Girish Shankarraman
>> 
>> 
>> From: Jon Meredith <jmeredith at basho.com>
>> Date: Thursday, October 1, 2015 at 2:06 PM
>> To: girish shankarraman <gshankarraman at vmware.com>, "riak-users at lists.basho.com" <riak-users at lists.basho.com>
>> Subject: Re: riak 2.1.1 : Erlang crash dump
>> 
>> It looks like Riak was unable to allocate 4Gb of memory.  You may have to reduce the amount of memory allocated for leveldb from the default 70%, try setting this in your /etc/riak/riak.conf file
>> 
>> leveldb.maximum_memory.percent = 50
>> 
>> The memory footprint for Riak should stabilize after a few hours and on servers with smaller amounts of memory, the 30% left over may not be enough.
>> 
>> Please let us know how you get on.
>> 
>> On Wed, Sep 30, 2015 at 5:31 PM Girish Shankarraman <gshankarraman at vmware.com> wrote:
>> I have 7 node cluster for riak with a ring_size of 128.
>> 
>> System Details:
>> Each node is a VM with 16GB of memory.
>> The backend is using leveldb.
>> sys_system_architecture : <<"x86_64-unknown-linux-gnu">>
>> sys_system_version : <<"Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]">>
>> riak_control_version : <<"2.1.1-0-g5898c40">>
>> cluster_info_version : <<"2.0.2-0-ge231144">>
>> yokozuna_version : <<"2.1.0-0-gcb41c27”>>
>> 
>> Scenario:
>> We have up to 400-1000 json records being written/sec. Each record might be a few 100 bytes.
>> I see the following crash message in the erlang logs after a few hours of processing. Any suggestions on what could be going on here ?
>> 
>> ===== Tue Sep 29 20:20:56 UTC 2015
>> [os_mon] memory supervisor port (memsup): Erlang has closed^M
>> [os_mon] cpu supervisor port (cpu_sup): Erlang has closed^M
>> ^M
>> Crash dump was written to: /var/log/riak/erl_crash.dump^M
>> eheap_alloc: Cannot allocate 3936326656 bytes of memory (of type "heap").^M
>> 
>> Also tested running this at 50GB per Riak Node(VM) and things work but memory keeps growing, so throwing hardware at it doesn’t seem very scalable.
>> 
>> Thanks,
>> 
>> — Girish Shankarraman
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> <riak.conf>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list