Yokozuna queries slow

Jason Campbell xiaclo at xiaclo.net
Tue Apr 21 01:06:49 EDT 2015


Hello,

I'm currently trying to debug slow YZ queries, and I've narrowed down the issue, but not sure how to solve it.

First off, we have about 80 million records in Riak (and YZ), but the queries return relatively few (a thousand or so at most).  Our query times are anywhere from 800ms to 1.5s.

I have been experimenting with queries directly on the Solr node, and it seems to be a problem with YZ and the way it does vnode filters.

Here is the same query, emulating YZ first:

{
  "responseHeader":{
    "status":0,
    "QTime":958,
    "params":{
      "q":"timestamp:[1429579919010 TO 1429579921010]",
      "indent":"true",
      "fq":"_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10",
      "rows":"0",
      "wt":"json"}},
  "response":{"numFound":80,"start":0,"docs":[]
  }}

And the same query, but including the vnode filter in the main body instead of using a filter query:

{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"timestamp:[1429579919010 TO 1429579921010] AND (_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10)",
      "indent":"true",
      "rows":"0",
      "wt":"json"}},
  "response":{"numFound":80,"start":0,"docs":[]
  }}

I understand there is a caching benefit to using filter queries, but a performance difference of 100x or greater doesn't seem worth it, especially with a constant data stream.

Is there a way to make YZ do this, or is the only way to query Solr directly, bypassing YZ?  Does anyone have any other suggestions of how to make this faster?

The timestamp field is a SolrTrieLongField with default settings if anyone is curious.

Thanks,
Jason



More information about the riak-users mailing list