How you are dealing with spikes?

Alexander Popov mogadanez at gmail.com
Wed Dec 31 06:38:50 EST 2014


Was replaced most 2i and MapReduce calls to SOLR requests,
seems not helps to much. Now Solr request has peaks sometimes


looking on /stats :
what is difference between search_index_latency search_query_latency


   - search_query_throughput_count: 364711,
   - search_query_throughput_one: 0,
   - search_query_fail_count: 6,
   - search_query_fail_one: 0,
   - search_query_latency_min: 0,
   - search_query_latency_max: 0,
   - search_query_latency_median: 0,
   - search_query_latency_95: 0,
   - search_query_latency_99: 0,
   - search_query_latency_999: 0,
   - search_index_throughput_count: 300612,
   - search_index_throughtput_one: 2585,
   - search_index_fail_count: 367,
   - search_index_fail_one: 12,
   - search_index_latency_min: 765,
   - search_index_latency_max: 49859,
   - search_index_latency_median: 1097,
   - search_index_latency_95: 2801,
   - search_index_latency_99: 18763,
   - search_index_latency_999: 37138,



On Mon, Dec 22, 2014 at 1:39 AM, Alexander Popov <mogadanez at gmail.com>
wrote:

> Left graph show counts, Right graph show times, graphs are synchronized by
> time
> What about SOLR requests instead 2i? should it be faster?
> Or what you recommend to use for populating lists of users data? for
> example now we have files,
>  that have 2i like owner, so when user request his files,   we populating
> buckets/files/owner_bin/user_id, If we change this query to SOLR analog ->
> can we gave some boost?
>
> Also, does key length matter for  2i performance? Does number of 2i
> indexes per object matter for 2i?
>
> On Tue, Dec 9, 2014 at 7:54 PM, Alexander Popov <mogadanez at gmail.com>
> wrote:
>
>> Stats when  recent spike  happens for 15 minutes around it
>>  get  (826)
>>  save  (341)
>>  listByIndex  (1161)
>>  mapReduce  (621)  //Input is IDs list
>>  SOLR  (4294)
>>
>> 6 Solr requests was longer than 9sec ( all returns 0 rows )
>> 4 Solr requests was longer within 4-5s ( both returns 0 rows )
>> 11 listByIndex requests was longer than within 4-5s ( both returns 0 rows
>> )
>> all another requests was less than 300ms
>>
>>
>> Sometimes more load do not make such spikes
>> Some graphs from  maintanance tasks:
>> 1. http://i.imgur.com/xAE6B06.png
>>     3 simple tasks, first 2  of them reads  all keys, decide to do
>> nothing and continue so just read happens, third task resave all data
>> in bucket.
>>     since  rate is pretty good, some peaks happens
>>
>> 2. More complex task
>> http://i.imgur.com/7nwHb3Q.png,  it have  more serious computing, and
>> updating typed bucked( map ), but no peaks to 9s
>>
>>
>>
>> sysctl -a | fgrep vm.dirty_:
>>
>> vm.dirty_background_bytes = 0
>> vm.dirty_background_ratio = 10
>> vm.dirty_bytes = 0
>> vm.dirty_expire_centisecs = 3000
>> vm.dirty_ratio = 20
>> vm.dirty_writeback_centisecs = 500
>>
>> On Tue, Dec 9, 2014 at 5:46 PM, Luke Bakken <lbakken at basho.com> wrote:
>> > Hi Alexander,
>> >
>> > Can you comment on the read vs. write load of this cluster/
>> >
>> > Could you please run the following command and reply with the output?
>> >
>> > sysctl -a | fgrep vm.dirty_
>> >
>> > We've seen cases where dirty pages get written in a synchronous manner
>> > all at once, causing latency spikes due to I/O blocking.
>> > --
>> > Luke Bakken
>> > Engineer / CSE
>> > lbakken at basho.com
>> >
>> >
>> > On Tue, Dec 9, 2014 at 4:58 AM, Alexander Popov <mogadanez at gmail.com>
>> wrote:
>> >> I have Riak 2.0.1 cluster with 5 nodes ( ec2 m3-large ) with elnm in
>> front
>> >> sometimes I got spikes  up to 10 seconds
>> >>
>> >> I can't say that I have  huge load at this time,  max 200 requests per
>> >> second for all 5 nodes.
>> >>
>> >> Most expensive queries is
>> >> * list by secondary index ( usually returns from 0 to 100 records  )
>> >> * and solr queries( max 10 records )
>> >>
>> >> save operations  is slowdown sometimes but not so much ( up to 1 sec )
>> >>
>> >> It's slowdown not for specific requests, same one work pretty fast
>> later.
>> >>
>> >> Does it any possibilities to profile|log somehow to determine reason
>> >> why this happen?
>> >>
>> >> _______________________________________________
>> >> riak-users mailing list
>> >> riak-users at lists.basho.com
>> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20141231/c216f034/attachment.html>


More information about the riak-users mailing list