Riak getting very slow

Fred Dushin fadushin at icloud.com
Sat Jun 17 14:37:38 EDT 2017


When you say "All not have solr on", do you mean not all nodes have search enabled?   If you are measuring Solr index latencies, then you definitely have Solr on at least one node.  Or is this just a typo?

Going on the assumption you have search enabled on all nodes (you should, if you are using search at all), you are seeing mean latencies on puts on the order of almost 16 seconds, and 99th percentile latencies reaching almost a minute.  Yes, that is slow!

Do you have any other metrics on what is going on with the Solr process?  It is a separate VM, and in general you can probe it using JMX, or even by scraping the proc file system.  I don't have anything handy out of the box, but I have written some collected python modules you should feel free to use and pilfer, if that helps:

https://github.com/fadushin/riak_puppet_stuff/tree/master/modules/riak_node/files/collectd

I am not enough of a Solr expert to say that having 100+ dynamic fields is the root cause of your issues with write latency.  It could be, so you should try to see if Solr is impacted if you write to a different Riak search index (i.e., Solr core).  That would of course require a new bucket, but those are cheap for the purposes of experimentation.  You may need to re-architect your application to use more Riak indices and bucket types, and to use statically defined Solr fields, in order to get over this hump.

Another thing you should consider doing is upgrading to Riak 2.0.9.  This includes very significant improvement to the write/index path into Solr, with support for batching and asynchronous delivery into Solr.  This won't necessarily fix your problem -- you should get to the bottom of why you are getting 16 seconds average write latencies into Solr for a single Solr document first, but it may give you some headroom in the future.

One other thing we have found leading up to the 2.0.8 release, and which was fixed in 2.0.8 and later, is that Solr does slow to a creep if you have a high number of siblings, almost linearly in the number of siblings.  This happens because Riak used to use the deleteByQuery Solr operation when indexing a document, which would cause Solr memory consumption to go through the roof, as well as CPU utilization.  We fixed this in 2.0.8 and later to delete previously existing documents by id, which is far less resource consumptive on the Solr side.  Do you have a handle on how many siblings you have in your Riak objects?

And BTW, if you upgrade to Riak 2.2.0, then you will also get an upgrade to Solr 4.10.

-Fred

> On Jun 16, 2017, at 4:49 PM, amol.zambare at bookmypacket.com wrote:
> 
> Hi All,
> 
> We are running riak kv 2.0.1 on 5 node, all are high end conf i.e it does
> not have any load. All not have solr on.
> 
> Still, We getting very high latency
> 
> After Some investigation, i have found what will be a possible issue,
> We have one bucket with solr index, solr index's each document has about
> 100+ dynamic fields in the Solr schema 
> 
> I have read two issue related to the same problem as below
> https://github.com/basho/yokozuna/issues/719
> https://github.com/basho/yokozuna/issues/330
> 
> This specify that you should not have more than 60 dynamic fields else riak
> will get slow because of solr index creation will be very slow
> 
> Below is riak-admin status related to solr
> rings_reconciled_total : 80
> search_index_fail_count : 1011
> search_index_fail_one : 5
> search_index_latency_95 : 36450099
> search_index_latency_99 : 54188877
> search_index_latency_999 : 54188877
> search_index_latency_max : 54188877
> search_index_latency_mean : 15818891
> search_index_latency_median : 17226576
> search_index_latency_min : 1919
> search_index_throughput_count : 36125
> search_index_throughput_one : 19
> search_query_fail_count : 29
> search_query_fail_one : 0
> search_query_latency_95 : 0
> search_query_latency_99 : 0
> search_query_latency_999 : 0
> search_query_latency_max : 0
> search_query_latency_mean : 0
> search_query_latency_median : 0
> search_query_latency_min : 0
> search_query_throughput_count : 3455
> 
> Also related to port time waiting as below
> netstat -anp | grep :8093 | grep EST | wc -l
> 20
> netstat -anp | grep :8093 | grep TIME_WAIT | wc -l
> 21
> 
> Please help us find out issue and what will be possible solution
> 
> Thanks,
> Amol
> 
> 
> 
> 
> --
> View this message in context: http://riak-users.197444.n3.nabble.com/Riak-getting-very-slow-tp4035209.html
> Sent from the Riak Users mailing list archive at Nabble.com.
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list