riaksearch performace when numFound is high

Ryan Zezeski rzezeski at basho.com
Wed Jul 13 22:19:20 EDT 2011


I'm assuming you are using 14.2.

1) There is a bug in 14.2 that will cause a (potentially very fast growing)
memory leak when using AND.  This is unfortunate, sorry.  The good news I
have since patched it [1].

2) This is your best course of action, and you were so close but you've
actually crossed your fields.  That is, the inline field should be the one
that contains the more common term (i.e. the 'text' field).  So you should
perform a range query on your date with a filter on the text inline field.
 Obviously, the more terms in this field the more the index will inflate
(space-wise), but if you can live with that then it should reduce your
latency substantially (famous last words).  Please try this and get back to

3) That is a very well written article, props to the author.  However, I
would leave this as a last resort.  Try what I mentioned in #2, and if
that's not enough to get you by then let's brainstorm.


On Wed, Jul 6, 2011 at 2:43 PM, Greg Pascale <greg at clipboard.com> wrote:

> Hi,
> I'm looking at ways to improve riaksearch queries that produce a lot of
> matches.
> In my use case, I only ever want the top 20 results for any query, and
> results should be ordered by date (which is encoded in the key). For
> searches with few matches (numFound < ~1000), performance is great. For
> searches with more matches (numFound > ~10000), performance starts to lag
> even though I only ever want the top 20. I assume this is because the system
> needs to fetch and sort all of the results to know what the top 20 are, but
> I'm hoping I can exploit the constraints of my use case in some way to
> increase performance. I've looked at the following approaches.
> 1) AND the "text:" term with a small date range (e.g. text:<common word>
> AND date:[<yesterday to today>]). This reduces the result set, but
> performance does not improve. At best, the performance is as good as simply
> doing the "text:<common word>" search without the date range, and in some
> cases worse.
> 2) Same as above, but make the date an inline field. From what I could find
> on the topic, it sounded like this is exactly what inline fields or for, but
> I was disappointed to discover it performed far worse than even the compound
> query above.
> 3) In this article <http://blog.inagist.com/searching-with-riaksearch>,
> which I was linked to from somewhere on the basho site, the author describes
> a technique in which he calls search_fold directly and stops after he's
> received enough results. He claims this is possible in his case because
> results are returned in key order, and he's chosen his keys to match the
> desired ordering of his results. My keys have the same property, as I'm
> already using the presort=key option. Is this behavior of search_fold a
> lucky side-effect, or is this actually guaranteed to work?
> Am I simply expecting too much of riaksearch here, or is there a way to
> make this work? If all else fails, I suppose I could divide my data into
> more buckets, but I'm hoping to avoid that as it would make querying much
> more complex.
> Thanks,
> -Greg
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110713/d0d063d5/attachment.html>

More information about the riak-users mailing list