Strange spike

Sean Cribbs sean at
Thu May 31 02:23:50 EDT 2012


The LevelDB storage backend has a known issue where compaction can stall a
heavily-loaded node for a long time (we've seen 60 seconds or more in
production clusters). We're very sorry about this, but an improvement will
be available in the next release. In the meantime, DO NOT make the node
leave the cluster - this will only make things worse! It might be worth
adding another node to the cluster, but I suggest you wait until the node
finishes compaction.

On Wed, May 30, 2012 at 10:43 PM, Nam Nguyen <nam at> wrote:

> Hi,
> My 5-node cluster exhibits a strange spike on one particular node.
> Overall, the mean get time is about 1ms. This node occasionally shoots up
> to 40ms.
> During those times, %iowait is still the same as it is before the spike.
> No error. Console log shows many lines like the below, which I don't think
> relevant to the spike.
> 2012-05-30 21:29:50.591 [info]
> <0.72.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc <0.938.0>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> [{timeout,185},{old_heap_block_size,0},{heap_block_size,2584},{mbuf_size,0},{stack_size,55},{old_heap_size,0},{heap_size,804}]
> The cluster is set up uniformly. Ubuntu 64bit, m2.2xlarge instance. Riak
> 1.1.2 with LevelDB backend.
> What would be the best course of actions for me?
> I plan to:
> - riak-admin leave on that node
> - set up new instance
> - riak-admin reip the new instance
> - riak-admin join it to the cluster
> Cheers,
> Nam
> _______________________________________________
> riak-users mailing list
> riak-users at

Sean Cribbs <sean at>
Software Engineer
Basho Technologies, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list