Yokozuna queries slow

Jason Campbell xiaclo at xiaclo.net
Tue Apr 21 04:11:15 EDT 2015


Thanks Zeeshan for the info.

Is there a workaround in the mean time, or is the only option to handle queries to the individual nodes ourselves?

Is there a planned timeframe for the 2.0.1 release?

Thanks,
Jason

> On 21 Apr 2015, at 16:13, Zeeshan Lakhani <zlakhani at basho.com> wrote:
> 
> Hey Jason,
> 
> We’re working on performance issues with YZ filter queries, e.g. https://github.com/basho/yokozuna/issues/392, and coverage plan generation/caching, and our CliServ team has started doing a ton of benchmarks as well.
> 
> You can bypass YZ, but then you’d have to create a way to generate your own coverage plans and other things involving distributed solr that YZ gives you. Nonetheless, we’re actively working on improving these issues you’ve encountered. 
> 
> Zeeshan Lakhani
> programmer | 
> software engineer at @basho | 
> org. member/founder of @papers_we_love | paperswelove.org
> twitter => @zeeshanlakhani
> 
>> On Apr 21, 2015, at 1:06 AM, Jason Campbell <xiaclo at xiaclo.net> wrote:
>> 
>> Hello,
>> 
>> I'm currently trying to debug slow YZ queries, and I've narrowed down the issue, but not sure how to solve it.
>> 
>> First off, we have about 80 million records in Riak (and YZ), but the queries return relatively few (a thousand or so at most).  Our query times are anywhere from 800ms to 1.5s.
>> 
>> I have been experimenting with queries directly on the Solr node, and it seems to be a problem with YZ and the way it does vnode filters.
>> 
>> Here is the same query, emulating YZ first:
>> 
>> {
>>  "responseHeader":{
>>    "status":0,
>>    "QTime":958,
>>    "params":{
>>      "q":"timestamp:[1429579919010 TO 1429579921010]",
>>      "indent":"true",
>>      "fq":"_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10",
>>      "rows":"0",
>>      "wt":"json"}},
>>  "response":{"numFound":80,"start":0,"docs":[]
>>  }}
>> 
>> And the same query, but including the vnode filter in the main body instead of using a filter query:
>> 
>> {
>>  "responseHeader":{
>>    "status":0,
>>    "QTime":1,
>>    "params":{
>>      "q":"timestamp:[1429579919010 TO 1429579921010] AND (_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10)",
>>      "indent":"true",
>>      "rows":"0",
>>      "wt":"json"}},
>>  "response":{"numFound":80,"start":0,"docs":[]
>>  }}
>> 
>> I understand there is a caching benefit to using filter queries, but a performance difference of 100x or greater doesn't seem worth it, especially with a constant data stream.
>> 
>> Is there a way to make YZ do this, or is the only way to query Solr directly, bypassing YZ?  Does anyone have any other suggestions of how to make this faster?
>> 
>> The timestamp field is a SolrTrieLongField with default settings if anyone is curious.
>> 
>> Thanks,
>> Jason
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list