Strange spike

Nam Nguyen nam at
Wed May 30 17:43:48 EDT 2012


My 5-node cluster exhibits a strange spike on one particular node.

Overall, the mean get time is about 1ms. This node occasionally shoots up to 40ms.

During those times, %iowait is still the same as it is before the spike. No error. Console log shows many lines like the below, which I don't think relevant to the spike.

2012-05-30 21:29:50.591 [info] <0.72.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc <0.938.0> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}] [{timeout,185},{old_heap_block_size,0},{heap_block_size,2584},{mbuf_size,0},{stack_size,55},{old_heap_size,0},{heap_size,804}]

The cluster is set up uniformly. Ubuntu 64bit, m2.2xlarge instance. Riak 1.1.2 with LevelDB backend.

What would be the best course of actions for me?

I plan to:

- riak-admin leave on that node
- set up new instance
- riak-admin reip the new instance
- riak-admin join it to the cluster


