Riak Memory Usage Constantly Growing

Shane McEwan shane at mcewan.id.au
Tue Oct 2 11:51:14 EDT 2012

Thanks John and Kelly. It's nice to know we're not the only ones. :-)

As I said, we'll be upgrading to 1.2 in the coming weeks so it's good to 
know that the memory issues might go away after that. It's not a 
showstopper for us, more of a curiosity and concern it might develop 
into something worse.

I'll persist with the etop and see if I can get it to run and will 
report back.

We're still using key filters in our MapReduce functions but we plan to 
move to 2i at the same time as upgrading to 1.2.

The word "monitor" doesn't appear in any of our logs for the last 5 
days. Just lots of:

2012-10-02 00:10:47.869 [error] <0.31890.1344> gen_fsm <0.31890.1344> in 
state wait_pipeline_shutdown terminated with reason: {sink_died,normal}
2012-10-02 00:10:47.909 [error] <0.31890.1344> CRASH REPORT Process 
<0.31890.1344> with 0 neighbours crashed with reason: {sink_died,normal}
2012-10-02 00:10:47.981 [error] <0.166.0> Supervisor 
riak_pipe_builder_sup had child undefined started with 
{riak_pipe_builder,start_link,undefined} at <0.31890.1344> exit with 
reason {sink_died,normal} in context child_terminated


On 02/10/12 15:55, Kelly McLaughlin wrote:
> John and Shane,
> I have been looking into some memory issues lately and I would be very interested in more
> information about your particular problems. If either of you are able to get some output
> from etop using the -sort memory option when you are having elevated memory usage it
> would be very helpful to see. I know that sometimes you get the connection_lost message
> when trying to use etop, but I have found that sometimes if you keep trying it may succeed
> after a few attempts.
> Are either of you using MapReduce? I see that John is using 2I. Shane, do you also use 2I?
> Finally, do you notice a lot of messages to the console or console log that have the either the
> phrase 'monitor large_heap' or 'monitor long_gc'?
> Kelly
> On Oct 2, 2012, at 6:11 AM, "John E. Vincent" <lusis.org+riak-users at gmail.com> wrote:
>> I would highly suggest you upgrade to 1.2 when possible. We were, up
>> until recently, running on 1.4 and seeing the same problems you
>> describe. Take a look at this graph:
>> http://i.imgur.com/0RtsU.png
>> That's just one of our nodes but all of them exhibited the same
>> behavior. The falloffs are where we had to bounce riak.
>> This is what one of our nodes looks like now and has looked like since
>> the upgrade:
>> http://i.imgur.com/pm7Nk.png
>> The change was SO dramatic that I seriously though /stats was broken.
>> I've verified outside of Riak and inside. The memory usage change was
>> very positive. Evidently there's even still a memory leak.
>> We're heavy 2i users. No multi backend.
>> On Tue, Oct 2, 2012 at 4:08 AM, Shane McEwan <shane at mcewan.id.au> wrote:
>>> G'day!
>>> Just recently we've noticed memory usage in our Riak cluster constantly
>>> increasing.
>>> The memory usage reported by the Riak stats "memory_total" parameter has
>>> been less than 100MB for nearly a year but has recently increased to over
>>> 1GB.
>>> If we restart the cluster memory usage usually returns back to what we would
>>> call "normal" but after a week or so of stability the memory usage starts
>>> gradually growing again. Sometimes after a growth spurt over a few days the
>>> memory usage will plateau and be stable again for a week or two and then put
>>> on another growth spurt. The memory usage starts increasing at the same
>>> moment on all 4 nodes.
>>> This graph [http://imagebin.org/230614] shows what I mean. The green shows
>>> the memory usage as reported by "memory_total" (left-hand y-axis scale). The
>>> red line shows the memory used by Riak's beam.smp process (right-hand y-axis
>>> scale).
>>> Also notice that the gradient of the recent growth seems to be increasing
>>> compared to the memory increases we had in August.
>>> We might have just assumed that the memory usage was normal Riak behaviour.
>>> Perhaps we have just tipped over some sort of internal buffer or cache and
>>> that causes some more memory to be allocated. However, whenever we notice
>>> the memory usage increasing it always coincides with the "riak-admin top"
>>> command failing to run.
>>> We try to run "riak-admin top" to diagnose what is using the memory but it
>>> returns: "Output server crashed: connection_lost". If we restart the cluster
>>> the top command works fine (but, of course, there's nothing interesting to
>>> see after a restart!).
>>> So our theory at the moment is that some sort of instability or race
>>> condition is causing Riak to start consuming more and more memory. A side
>>> effect of this instability is that the internal processes needed for running
>>> the top command are not working correctly. The actual functionality of Riak
>>> doesn't seem to be affected. Our application is running fine. We see a
>>> slight increase in "FSM Put" times and CPU usage during the memory growth
>>> phases but all other parameters we're monitoring on the system seem
>>> unaffected.
>>> There's nothing abnormal in the logs. We get a lot of "riak_pipe_builder_sup
>>> {sink_died,normal}" messages but they can be ignored, apparently. The
>>> cluster is under constant load so we would expect to see either gradual
>>> memory increase or a steady state but not both. Erlang process count, open
>>> file handles, etc are stable.
>>> So I was wondering if anyone has seen similar behaviour before?
>>> Is there anything else we can do to diagnose the problem?
>>> I'm accessing the stats URL once per minute, could that have any side
>>> effects?
>>> We'll be upgrading to Riak 1.2 and new hardware in the next few weeks so
>>> should we just ignore it and hope it goes away?
>>> Any other ideas?
>>> Or is this just normal?
>>> Riak config:
>>> 4 VMware nodes
>>> ring_creation_size, 256
>>> n_val, 3
>>> eleveldb backend:
>>>   max_open_files, 20
>>>   cache_size, 15728640
>>> "riak_kv_version":"1.1.1",
>>> "riak_core_version":"1.1.1",
>>> "stdlib_version":"1.17.4",
>>> "kernel_version":"2.14.4"
>>> Erlang R14B03 (erts-5.8.4)
>>> Thanks!
>>> Shane.
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

More information about the riak-users mailing list