riak-users Digest, Vol 28, Issue 55
fearsome.lucidity at gmail.com
Wed Nov 30 11:45:37 EST 2011
On Wed, Nov 30, 2011 at 6:01 AM, <riak-users-request at lists.basho.com> wrote:
> From: Jeroen van Dijk <jeroentjevandijk at gmail.com>
> The use case I'm talking about is when you are looking for a term that is
> very common and thus will yield many results. My understanding of the
> implementation of Riak  is that the search is divided into
> a few phases. The first one is collecting results for each term. After that
> comes merging, sorting and limiting the result set. So for this particular
> case collecting all results would be infeasible and would kill performance.
> Even when a limit is set because limiting comes in a phase after collecting
> and the merging of results.
That's correct. We have similar issues. We've resorted to creating the
equivalent of multicolumn indexes by joining certain fields together and
indexing those. That is only possible because most of the data we want to
index is structured or semi-structured. You'd have to determine whether
such an approach is feasible for your purposes.
We also found 2i to be faster than Search, at the expense of requiring our
app to perform tokenization for some of the fields we want to index, but
we've stuck with Search as we need composable queries, which 2i does not
I've read here  that one can use search_fold to interrupt the collecting
> phase when enough results are fetched. I would like to know if this a
> best/official practice and if it really solves the issue?
Search_fold will only be useful if you plan on developing in Erlang and, if
my understanding is correct, if you don't care about the order of the
results (i.e. no scoring or field sorting). Actually, the results may be
partially ordered, as the merge_index backend may store the postings sorted
by the inverse of time.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users