riaksearch performance, row limit, sorting not necessary
dan.rathbone at gmail.com
Thu Apr 14 13:18:21 EDT 2011
To be clear, I'm only talking about the solr interface. I'm wondering if my
query time will remain fixed (since it's capped at rows=1000) as I add
several million docs to the index.
If I use my search as an input into Map/Reduce, won't my response time grow
with my index? My search query would queue up a very large result set - and
I expect performance to suffer if I trim this down in a reduce phase.
It would seem that I can prevent that slowdown by limiting the rows in the
search (with rows=1000). Despite that limit, though, I hit the
too_many_results error which indicates that the search queues up a very
large result set before it applies the row limit. Is there something I'm
Basically, I'm wondering if my query time will remain
On Thu, Apr 14, 2011 at 7:53 AM, Gordon Tillman <gtillman at mezeo.com> wrote:
> Daniel the max_search_results only applies to searches done via the solr
> interface. From
> - System now aborts queries that would queue up too many documents in
> a result set. This is controlled by a 'max_search_results' setting
> in riak_search. Note that this only affects the Solr
> interface. Searches through the Riak Client API that feed into a
> Map/Reduce job are still allowed to execute because the system
> streams those results.
> So you can use a map-reduce operation (with the search phase providing the
> inputs) and you should be OK.
> On Apr 14, 2011, at 04:49 , Daniel Rathbone wrote:
> Hi list,
> I'm wondering how riaksearch performance will degrade as I add documents.
> For my purpose I limit rows at 1k and sorting is not necessary. I have a
> single node cluster for development. I know I can increase performance if I
> add nodes but I'd like to understand this before I do.
> My documents are small ~200 bytes. With an index of 30k and rows limited
> to 1k, no problems. I added 100k documents, and then I hit
> the too_many_results error. Since I still have my row limit set at 1k, this
> indicates that the query does not return as soon as it finds the first 1k
> hits. Is there a way to short circuit my queries so that they don't have to
> scan the whole index?
> I got around too_many_results by increasing my max_search_results (I read
> I wonder, though, if I'll keep bumping memory boundaries as I add a few
> million docs to my index.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users