riak search questions

Greg Steffensen greg.steffensen at gmail.com
Mon Oct 11 15:49:57 EDT 2010

Having just got back from the Lucene Revolution convention and seen several
hours of presentations that were essentially about how to configure large
distributed Lucene applications, Riak Search looks REALLY interesting.  A
couple questions:

I know that in Lucene and Solr, committing many newly-indexed documents at
once will provide much better performance than committing them
individually.  Will there be a similar performance cost to indexing one
document at a time via the pre-commit hook, as opposed to indexing in bulk
via the search-cmd program?

Also, the decision to partition the index by terms rather than by documents
strikes me as the most interesting design decision in Riak Search.  Could
this lead to unbalanced node utilization in queries?  For example, I'd like
to implement a large search application that implements access control via
the index (adding some extra clauses to the queries generated by users), so
there would be a handful of terms that are used in almost all queries.
Would a query set like that lead to a few nodes being much more utilized
than others?

Awesome, awesome work, I can't wait to try this out.

