Riak search performance FUD
rusty at basho.com
Wed Nov 30 11:49:21 EST 2011
Your understanding is correct, the search query is parsed into a tree,
where each leaf of the tree corresponds to a term. Each leaf sends back all
matching terms, and results are intersected (or unioned) where the branches
come together. So yes, if you were to run a search on a term with a large
number of results, the system reads the entire list of keys (not objects)
for that result.
You may want to take another look at inline fields. They allow you to limit
the results at the leaf level, and can greatly improve performance for
The example I generally use to illustrate inline fields is to imagine
searching for all males living in a specific zip code or postal code. In a
normal search, a query on zip code would return ~100k results, and a query
on "male" would return roughly half of the world's population. However, you
can mark gender as an inline field, and then structure your query as two
parts: a primary query on the zip code, and a filter on the gender. The
filter is applied directly after the data is fetched from disk, before it
is streamed through the rest of the system, so it is a very fast way to
limit your results.
That said, there are currently known issues around sorting and pagination
in Riak Search, the upshot is that if you apply sorting and pagination at
the same time, it can give incorrect or unpredictable results; this might
be something to consider while planning your application. (
I would recommend against using search_fold because it could break in the
future, it is not intended to be a part of the public API.
Hope that helps,
On Wed, Nov 30, 2011 at 5:01 AM, Jeroen van Dijk <jeroentjevandijk at gmail.com
> Hi all,
> I'm currently evaluating the search functionality of Riak. This involves
> porting an application from Postgres/Sphinx to possibly only Riak. The
> application I'm porting doesn't need advanced search, but it does need a
> level of search I have come to believe this isn't provided in a feasible
> way by Riak Search out of the box. I've also seen some sources that make me
> worry about the performance of search [1, 2]. I hope to be proved wrong
> here or get some advice how to work around this so I can just use Riak
> Search and without an external search facility. As a disclaimer, I haven't
> done any benchmarks yet and this is just based on what I have read so far.
> The use case I'm talking about is when you are looking for a term that is
> very common and thus will yield many results. My understanding of the
> implementation of Riak  is that the search is divided into
> a few phases. The first one is collecting results for each term. After that
> comes merging, sorting and limiting the result set. So for this particular
> case collecting all results would be infeasible and would kill performance.
> Even when a limit is set because limiting comes in a phase after collecting
> and the merging of results.
> The first question is, can the above be confirmed? I've read about Riak
> Search performance optimization here , but that seems to be for a
> different problem.
> I've read here  that one can use search_fold to interrupt the
> collecting phase when enough results are fetched. I would like to know if
> this a best/official practice and if it really solves the issue?
> I guess what I'm missing is a wiki page of "when and when not to use Riak
> Search" or "how and how not to use Riak search". If this already exists I
> completely missed it.
>  http://blog.inagist.com/searching-with-riaksearch
> riak-users mailing list
> riak-users at lists.basho.com
Rusty Klophaus (@rustyio)
*Basho Technologies, Inc.*
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users