riak search questions
rusty at basho.com
Mon Oct 11 16:13:02 EDT 2010
Great questions! Responses below:
On Mon, Oct 11, 2010 at 3:49 PM, Greg Steffensen
<greg.steffensen at gmail.com>wrote:
> Having just got back from the Lucene Revolution convention and seen several
> hours of presentations that were essentially about how to configure large
> distributed Lucene applications, Riak Search looks REALLY interesting. A
> couple questions:
> I know that in Lucene and Solr, committing many newly-indexed documents at
> once will provide much better performance than committing them
> individually. Will there be a similar performance cost to indexing one
> document at a time via the pre-commit hook, as opposed to indexing in bulk
> via the search-cmd program?
The backend that Search uses (merge_index) was designed to provide
efficient, real-time indexing. It handles document-at-a-time indexing very
well. The main difference between indexing a batch of documents in Riak
Search (through the Riak Search's Solr interface, for instance) vs. indexing
the same documents one at a time is that the system can be smarter about
batching the messages, leading to performance gains from the reduced
> Also, the decision to partition the index by terms rather than by documents
> strikes me as the most interesting design decision in Riak Search. Could
> this lead to unbalanced node utilization in queries? For example, I'd like
> to implement a large search application that implements access control via
> the index (adding some extra clauses to the queries generated by users), so
> there would be a handful of terms that are used in almost all queries.
> Would a query set like that lead to a few nodes being much more utilized
> than others?
Yes, fields with low cardinality can lead to lopsided partitions. During
Riak Search development, we sketched out a few different solutions to the
problem, but in testing we found that in many cases this is not as big of a
problem as expected (due in part to compression, batching, and replicas that
minimize or balance out the load), and so ultimately decided to wait for
real-world feedback before picking an approach.
> Awesome, awesome work, I can't wait to try this out.
On behalf of the team, thank you!
> riak-users mailing list
> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users