riak search questions

Rusty Klophaus rusty at basho.com
Mon Oct 11 16:13:02 EDT 2010

Hi Greg,

Great questions! Responses below:

On Mon, Oct 11, 2010 at 3:49 PM, Greg Steffensen
<greg.steffensen at gmail.com>wrote:

> Having just got back from the Lucene Revolution convention and seen several
> hours of presentations that were essentially about how to configure large
> distributed Lucene applications, Riak Search looks REALLY interesting.  A
> couple questions:
> I know that in Lucene and Solr, committing many newly-indexed documents at
> once will provide much better performance than committing them
> individually.  Will there be a similar performance cost to indexing one
> document at a time via the pre-commit hook, as opposed to indexing in bulk
> via the search-cmd program?

The backend that Search uses (merge_index) was designed to provide
efficient, real-time indexing. It handles document-at-a-time indexing very
well. The main difference between indexing a batch of documents in Riak
Search (through the Riak Search's Solr interface, for instance) vs. indexing
the same documents one at a time is that the system can be smarter about
batching the messages, leading to performance gains from the reduced

> Also, the decision to partition the index by terms rather than by documents
> strikes me as the most interesting design decision in Riak Search.  Could
> this lead to unbalanced node utilization in queries?  For example, I'd like
> to implement a large search application that implements access control via
> the index (adding some extra clauses to the queries generated by users), so
> there would be a handful of terms that are used in almost all queries.
> Would a query set like that lead to a few nodes being much more utilized
> than others?

Yes, fields with low cardinality can lead to lopsided partitions. During
Riak Search development, we sketched out a few different solutions to the
problem, but in testing we found that in many cases this is not as big of a
problem as expected (due in part to compression, batching, and replicas that
minimize or balance out the load), and so ultimately decided to wait for
real-world feedback before picking an approach.

> Awesome, awesome work, I can't wait to try this out.

On behalf of the team, thank you!

> Greg
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101011/1194d9ff/attachment.html>

More information about the riak-users mailing list