Solr search response time spikes

Fred Dushin fred at dushin.net
Thu Jun 22 16:40:48 EDT 2017


It's pretty strange that you are seeing no search latency measurements on node 5.  Are you sure your round robining is working?  Are you favoring node 1?

In general, I don't think which node you hit for query should make a difference, but I'd have to stare at the code some to be sure.  In essence, all the node that services the query does is convert the query into a sharded Solr query based on a coverage plan, which changes every minute or so, and then runs the sharded query on the local Solr node.  The Solr node then distributes the query to the rest of the nodes in the cluster, but that's all Solr comms -- Riak is out of the picture, by then.

Now, if you have a lot of sharded queries accumulating on one node, that might make a difference to Solr.  I am not a Solr expert, and I don't even play one on TV.  But maybe the fact that you are not hitting node 5 is relevant for that reason?

Can you do more analysis on your client, to make sure you are not favoring node 1?

-Fred

> On Jun 22, 2017, at 10:20 AM, sean mcevoy <sean.mcevoy at gmail.com> wrote:
> 
> Hi List,
> 
> We have a standard riak cluster with 5 nodes and at the minute the traffic levels are fairly low. Each of our application nodes has 25 client connections, 5 to each riak node which get selected in a round robin.
> 
> Our application level requests involve multiple riak requests so our traffic tends to make requests in small bursts. Everything works fine for KV gets, puts & deletes but we're seeing timeouts & weird response time spikes on solr search operations.
> 
> In the past 36 hours (the only period I have riak stats for) I see one response time of 38.8 seconds, 3 hours earlier a response time of 20.8 seconds, and the third biggest spike is an acceptable 3.5 seconds.
> 
> See below all search_query stats for the minute of the 38 sec sample. In the application request we made 5 riak search requests to the same index in parallel, which happens for each request of this type and normally doesn't have an issue. But in this case all 5 timed out, and one timed out again on retry with the other 4 succeeding.
> 
> Anyone ever seen anything like this before? Is there any known deadlock in solr that I might hit if I make the same request on another connection before the first has completed? This is what we do when our riak client times out after 2 seconds and immediately retries.
> 
> Any advice or pointers welcomed.
> Thanks,
> //Sean.
> 
> 
> Riak node 1
> search_query_throughput_one: 14
> search_query_throughput_count: 259
> search_query_latency_min: 2776
> search_query_latency_median: 69411
> search_query_latency_mean: 4900973
> search_query_latency_max: 38887902
> search_query_latency_999: 38887902
> search_query_latency_99: 38887902
> search_query_latency_95: 2046215
> search_query_fail_one: 0
> search_query_fail_count: 0
> 
> Riak node 2
> search_query_throughput_one: 22
> search_query_throughput_count: 564
> search_query_latency_min: 4006
> search_query_latency_median: 8800
> search_query_latency_mean: 11834
> search_query_latency_max: 25509
> search_query_latency_999: 25509
> search_query_latency_99: 25509
> search_query_latency_95: 24035
> search_query_fail_one: 0
> search_query_fail_count: 0
> 
> Riak node 3
> search_query_throughput_one: 6
> search_query_throughput_count: 298
> search_query_latency_min: 3200
> search_query_latency_median: 15391
> search_query_latency_mean: 18062
> search_query_latency_max: 31759
> search_query_latency_999: 31759
> search_query_latency_99: 31759
> search_query_latency_95: 31759
> search_query_fail_one: 0
> search_query_fail_count: 0
> 
> Riak node 4
> search_query_throughput_one: 8
> search_query_throughput_count: 334
> search_query_latency_min: 2404
> search_query_latency_median: 7230
> search_query_latency_mean: 10211
> search_query_latency_max: 22502
> search_query_latency_999: 22502
> search_query_latency_99: 22502
> search_query_latency_95: 22502
> search_query_fail_one: 0
> search_query_fail_count: 0
> 
> Riak node 5
> search_query_throughput_one: 0
> search_query_throughput_count: 0
> search_query_latency_min: 0
> search_query_latency_median: 0
> search_query_latency_mean: 0
> search_query_latency_max: 0
> search_query_latency_999: 0
> search_query_latency_99: 0
> search_query_latency_95: 0
> search_query_fail_one: 0
> search_query_fail_count: 0
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list