Yokozuna queries slow

Jason Campbell xiaclo at xiaclo.net
Wed Apr 22 05:23:31 EDT 2015


After some comparisons with tcpdump, I think the strange results are related to a too aggressive redirect rule.

Please ignore.  Sorry for the confusion.

> On 22 Apr 2015, at 12:35, Jason Campbell <xiaclo at xiaclo.net> wrote:
> 
> Sorry, still running into issues.
> 
> So I've disabled cache on all solr nodes, but the yokozuna query times are still high.
> 
> Oddly, running queries against any one node completes in a few ms.  So I injected nginx between riak and solr so I could log queries, and now I'm even more confused.
> 
> There is 5 requests to each solr node for every request to yokozuna.
> 
> These are the nginx logs for one Solr node.  The log format is apache combined, plus the total query time, and a dump of the POST body.
> 
> 10.0.1.238 - - [22/Apr/2015:02:18:01 +0000] "POST /internal_solr/snapshots_index/select HTTP/1.1" 200 774 "-" "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" 0.007 "rows=0&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&q=*%3A*&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10&start=0&fsv=true&fl=_yz_id%2Cscore&distrib=false&isShard=true&shard.url=10.0.1.235%3A8093%2Finternal_solr%2Fsnapshots_index&NOW=1429669081777&wt=javabin&version=2"
> 10.0.1.238 - - [22/Apr/2015:02:18:01 +0000] "POST /internal_solr/snapshots_index/select HTTP/1.1" 200 774 "-" "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" 0.008 "rows=0&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&q=*%3A*&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10&start=0&fsv=true&fl=_yz_id%2Cscore&distrib=false&isShard=true&shard.url=10.0.1.234%3A8093%2Finternal_solr%2Fsnapshots_index&NOW=1429669081777&wt=javabin&version=2"
> 10.0.1.238 - - [22/Apr/2015:02:18:01 +0000] "POST /internal_solr/snapshots_index/select HTTP/1.1" 200 774 "-" "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" 0.001 "rows=0&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&q=*%3A*&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10&start=0&fsv=true&fl=_yz_id%2Cscore&distrib=false&isShard=true&shard.url=10.0.1.237%3A8093%2Finternal_solr%2Fsnapshots_index&NOW=1429669081777&wt=javabin&version=2"
> 
> These queries look fine, they are from localhost, but over the internal IP address, and they all complete in under 10ms.  The return the exact same body length, so I'm confused why it's run 3 times, but not much of a performance concern.
> 
> 10.0.1.238 - - [22/Apr/2015:02:18:03 +0000] "POST /internal_solr/snapshots_index/select HTTP/1.1" 200 778 "-" "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" 1.844 "rows=0&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&q=*%3A*&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10&start=0&fsv=true&fl=_yz_id%2Cscore&distrib=false&isShard=true&shard.url=10.0.1.238%3A8093%2Finternal_solr%2Fsnapshots_index&NOW=1429669081777&wt=javabin&version=2"
> 
> This is where I start getting worried, the query is now changed to *:* instead of the one I passed to yokozuna, and rows=0.  Not surprisingly, this takes almost 2 seconds since it has to count a massive amount of documents.
> 
> 127.0.0.1 - - [22/Apr/2015:02:18:03 +0000] "POST /internal_solr/snapshots_index/select HTTP/1.1" 200 882 "-" "-" 1.856 "q=%2A%3A%2A&rows=0&wt=json&shards=10.0.1.234%3A8093%2Finternal_solr%2Fsnapshots_index%2C10.0.1.235%3A8093%2Finternal_solr%2Fsnapshots_index%2C10.0.1.237%3A8093%2Finternal_solr%2Fsnapshots_index%2C10.0.1.238%3A8093%2Finternal_solr%2Fsnapshots_index&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10"
> 
> This is a similar query as the previous one, but this one is from localhost (not the private IP) and the output format is json, not javabin.  It's also missing the usual Solr user agent.  This is still a *:* query requesting 0 rows though, and also takes almost 2 seconds.
> 
> Now I'm not sure these last 2 queries are actually required to complete the yokozuna request, but they are run every time I make a new request.  Although each of these queries takes over 1.8 seconds, the yokozuna request completed in 1.2 seconds.  However, if it was only waiting on the shorter requests, even 1.2s seconds is much too long.
> 
> Can anyone give insight into what yokozuna is doing here and hopefully a way to make it as fast as the underlying solr queries (>10ms)?
> 
> Still confused,
> Jason
> 
>> On 22 Apr 2015, at 08:27, Jason Campbell <xiaclo at xiaclo.net> wrote:
>> 
>> That is really weird, I tried that before posting, yet it seems to fix things now.
>> 
>> Oh well, I'll try rolling it out across the cluster, hopefully it works.
>> 
>> Maybe someone with more Solr knowledge can enlighten me though, does disabling the filter cache change behaviour as well?
>> 
>> For example, running q=_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10 still takes ~1500ms, yet q=timestamp:[1429579919010 TO 1429579921010]&qf=_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10 takes 0-5ms.
>> 
>> So clearly the query filter isn't being executed in the same way it was before, it's not just that it doesn't have to write the result to a cache.
>> 
>> Thanks for the help guys.
>> 
>>> On 22 Apr 2015, at 08:02, Zeeshan Lakhani <zlakhani at basho.com> wrote:
>>> 
>>> Hey Steve,
>>> 
>>> Yep, that’s the specific reference point I was referring to in the link I shared. 
>>> 
>>> We’re working on cleaning-up some stuff to help perf for large indexes within the yz code itself. We’ve also discussed how to improve configuring specific solrconfig tunable items as a property of the index (not bucket); so, it’s something we’re looking into.
>>> 
>>> Thanks for helping out :).
>>> 
>>> Zeeshan Lakhani
>>> programmer | 
>>> software engineer at @basho | 
>>> org. member/founder of @papers_we_love | paperswelove.org
>>> twitter => @zeeshanlakhani
>>> 
>>>> On Apr 21, 2015, at 5:41 PM, Steve Garon <steve.garon at gmail.com> wrote:
>>>> 
>>>> Zeeshan, 
>>>> 
>>>> For that specific case, you guys should add {!cache=false} in front on your query plan. Therefore, queries on large index won't be slowed down. I'd really like to see some of the solrconfig.xml config to be exported to the riak bucket properties. The caching flag could be a property on the bucket. Same for soft commit timeouts. We had to increase soft commit timeouts to 10sec instead of the 1sec default. 
>>>> 
>>>> 
>>>> Steve
>>>> 
>>>> On 21 April 2015 at 16:02, Zeeshan Lakhani <zlakhani at basho.com> wrote:
>>>> Nice Steve.
>>>> 
>>>> Zeeshan Lakhani
>>>> programmer | 
>>>> software engineer at @basho | 
>>>> org. member/founder of @papers_we_love | paperswelove.org
>>>> twitter => @zeeshanlakhani
>>>> 
>>>>> On Apr 21, 2015, at 3:57 PM, Steve Garon <steve.garon at gmail.com> wrote:
>>>>> 
>>>>> Jason, 
>>>>> 
>>>>> Comment the <filterCache .../> section in the bucket's solrconfig.xml and restart riak. Now your queries will be fast again :-)
>>>>> 
>>>>> 
>>>>> Steve
>>>>> 
>>>>> On 21 April 2015 at 04:24, Zeeshan Lakhani <zlakhani at basho.com> wrote:
>>>>> No real workaround other than what you described or looking into config/fq-no-cache settings as mentioned in http://lucidworks.com/blog/advanced-filter-caching-in-solr/ and playing around with those.
>>>>> 
>>>>> Riak is now at 2.1.0. I hope that one of the next few point releases will see the fix. 
>>>>> 
>>>>> 
>>>>> Zeeshan Lakhani
>>>>> programmer | 
>>>>> software engineer at @basho | 
>>>>> org. member/founder of @papers_we_love | paperswelove.org
>>>>> twitter => @zeeshanlakhani
>>>>> 
>>>>>> On Apr 21, 2015, at 4:11 AM, Jason Campbell <xiaclo at xiaclo.net> wrote:
>>>>>> 
>>>>>> Thanks Zeeshan for the info.
>>>>>> 
>>>>>> Is there a workaround in the mean time, or is the only option to handle queries to the individual nodes ourselves?
>>>>>> 
>>>>>> Is there a planned timeframe for the 2.0.1 release?
>>>>>> 
>>>>>> Thanks,
>>>>>> Jason
>>>>>> 
>>>>>>> On 21 Apr 2015, at 16:13, Zeeshan Lakhani <zlakhani at basho.com> wrote:
>>>>>>> 
>>>>>>> Hey Jason,
>>>>>>> 
>>>>>>> We’re working on performance issues with YZ filter queries, e.g. https://github.com/basho/yokozuna/issues/392, and coverage plan generation/caching, and our CliServ team has started doing a ton of benchmarks as well.
>>>>>>> 
>>>>>>> You can bypass YZ, but then you’d have to create a way to generate your own coverage plans and other things involving distributed solr that YZ gives you. Nonetheless, we’re actively working on improving these issues you’ve encountered. 
>>>>>>> 
>>>>>>> Zeeshan Lakhani
>>>>>>> programmer | 
>>>>>>> software engineer at @basho | 
>>>>>>> org. member/founder of @papers_we_love | paperswelove.org
>>>>>>> twitter => @zeeshanlakhani
>>>>>>> 
>>>>>>>> On Apr 21, 2015, at 1:06 AM, Jason Campbell <xiaclo at xiaclo.net> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> I'm currently trying to debug slow YZ queries, and I've narrowed down the issue, but not sure how to solve it.
>>>>>>>> 
>>>>>>>> First off, we have about 80 million records in Riak (and YZ), but the queries return relatively few (a thousand or so at most).  Our query times are anywhere from 800ms to 1.5s.
>>>>>>>> 
>>>>>>>> I have been experimenting with queries directly on the Solr node, and it seems to be a problem with YZ and the way it does vnode filters.
>>>>>>>> 
>>>>>>>> Here is the same query, emulating YZ first:
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "responseHeader":{
>>>>>>>> "status":0,
>>>>>>>> "QTime":958,
>>>>>>>> "params":{
>>>>>>>>   "q":"timestamp:[1429579919010 TO 1429579921010]",
>>>>>>>>   "indent":"true",
>>>>>>>>   "fq":"_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10",
>>>>>>>>   "rows":"0",
>>>>>>>>   "wt":"json"}},
>>>>>>>> "response":{"numFound":80,"start":0,"docs":[]
>>>>>>>> }}
>>>>>>>> 
>>>>>>>> And the same query, but including the vnode filter in the main body instead of using a filter query:
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "responseHeader":{
>>>>>>>> "status":0,
>>>>>>>> "QTime":1,
>>>>>>>> "params":{
>>>>>>>>   "q":"timestamp:[1429579919010 TO 1429579921010] AND (_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10)",
>>>>>>>>   "indent":"true",
>>>>>>>>   "rows":"0",
>>>>>>>>   "wt":"json"}},
>>>>>>>> "response":{"numFound":80,"start":0,"docs":[]
>>>>>>>> }}
>>>>>>>> 
>>>>>>>> I understand there is a caching benefit to using filter queries, but a performance difference of 100x or greater doesn't seem worth it, especially with a constant data stream.
>>>>>>>> 
>>>>>>>> Is there a way to make YZ do this, or is the only way to query Solr directly, bypassing YZ?  Does anyone have any other suggestions of how to make this faster?
>>>>>>>> 
>>>>>>>> The timestamp field is a SolrTrieLongField with default settings if anyone is curious.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jason
>>>>>>>> _______________________________________________
>>>>>>>> riak-users mailing list
>>>>>>>> riak-users at lists.basho.com
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> riak-users mailing list
>>>>>>> riak-users at lists.basho.com
>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list