Strange spike

Nam Nguyen nam at tinyco.com
Thu May 31 16:22:18 EDT 2012


Hi Seth,

Yes, I am using the default config.

Is it safe to change these values and restart riak?

Nam

On May 31, 2012, at 11:24 AM, Seth Benton wrote:

> Hey,
> 
> Apologies if this is the wrong place for this, but I just updated the eLevelDB wiki page to mention randomization of the write buffer length (via setting write_buffer_size_min and write_buffer_size_max).  Before there was no mention of these config parameters.  Perhaps people were just using levelDB's 4MB default buffer size, causing all the vnodes to compact at the same time?  Or are there default write_buffer_size_min and write_buffer_size_max parameters under the hood?
> 
> http://wiki.basho.com/LevelDB.html
> 
> P.S.  Mathew V is getting back to me shortly on changes to this page due to changes in 1.2.
> 
> Seth
> (Tech Writer)
> 
> 
> On Thu, May 31, 2012 at 9:26 AM, Nam Nguyen <nam at tinyco.com> wrote:
> Hi Sean,
> 
> You are right. At first I thought it was localized to that one particular node. Now others are also exhibiting the same symptom.
> 
> I am putting in another node.
> 
> Cheers,
> Nam
> 
> 
> On May 30, 2012, at 11:23 PM, Sean Cribbs wrote:
> 
>> Nam,
>> 
>> The LevelDB storage backend has a known issue where compaction can stall a heavily-loaded node for a long time (we've seen 60 seconds or more in production clusters). We're very sorry about this, but an improvement will be available in the next release. In the meantime, DO NOT make the node leave the cluster - this will only make things worse! It might be worth adding another node to the cluster, but I suggest you wait until the node finishes compaction.
>> 
>> On Wed, May 30, 2012 at 10:43 PM, Nam Nguyen <nam at tinyco.com> wrote:
>> Hi,
>> 
>> My 5-node cluster exhibits a strange spike on one particular node.
>> 
>> Overall, the mean get time is about 1ms. This node occasionally shoots up to 40ms.
>> 
>> During those times, %iowait is still the same as it is before the spike. No error. Console log shows many lines like the below, which I don't think relevant to the spike.
>> 
>> 2012-05-30 21:29:50.591 [info] <0.72.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc <0.938.0> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}] [{timeout,185},{old_heap_block_size,0},{heap_block_size,2584},{mbuf_size,0},{stack_size,55},{old_heap_size,0},{heap_size,804}]
>> 
>> The cluster is set up uniformly. Ubuntu 64bit, m2.2xlarge instance. Riak 1.1.2 with LevelDB backend.
>> 
>> What would be the best course of actions for me?
>> 
>> I plan to:
>> 
>> - riak-admin leave on that node
>> - set up new instance
>> - riak-admin reip the new instance
>> - riak-admin join it to the cluster
>> 
>> Cheers,
>> Nam
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> 
>> -- 
>> Sean Cribbs <sean at basho.com>
>> Software Engineer
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120531/79b164a2/attachment.html>


More information about the riak-users mailing list