riaksearch performance, row limit, sorting not necessary

Gordon Tillman gtillman at mezeo.com
Thu Apr 14 15:27:31 EDT 2011


Hi Daniel,

If you use search to provide (streaming) input to map/reduce then you can do additional processing in the M/R phases to condition and limit your results.  For example you can do additional filtering in a map phase if necessary, as well as perhaps extracting some subset of the data that is being returned if that is applicable.  You can add one or more reduce phases to sort and paginate (slice) the results.

So you will still only be returning a predetermined number of records from the last reduce phase.

--gordon


On Apr 14, 2011, at 12:18 , Daniel Rathbone wrote:

To be clear, I'm only talking about the solr interface.  I'm wondering if my query time will remain fixed (since it's capped at rows=1000) as I add several million docs to the index.

If I use my search as an input into Map/Reduce, won't my response time grow with my index? My search query would queue up a very large result set - and I expect performance to suffer if I trim this down in a reduce phase.

It would seem that I can prevent that slowdown by limiting the rows in the search (with rows=1000).  Despite that limit, though, I hit the too_many_results error which indicates that the search queues up a very large result set before it applies the row limit.  Is there something I'm missing here?

thanks,
Daniel


Basically, I'm wondering if my query time will remain

On Thu, Apr 14, 2011 at 7:53 AM, Gordon Tillman <gtillman at mezeo.com<mailto:gtillman at mezeo.com>> wrote:
Daniel the max_search_results only applies to searches done via the solr interface.  From http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-January/002974.html:

- System now aborts queries that would queue up too many documents in
  a result set. This is controlled by a 'max_search_results' setting
  in riak_search. Note that this only affects the Solr
  interface. Searches through the Riak Client API that feed into a
  Map/Reduce job are still allowed to execute because the system
  streams those results.


So you can use a map-reduce operation (with the search phase providing the inputs) and you should be OK.

--gordon

<http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-January/002974.html>
On Apr 14, 2011, at 04:49 , Daniel Rathbone wrote:

Hi list,

I'm wondering how riaksearch performance will degrade as I add documents.

For my purpose I limit rows at 1k and sorting is not necessary.  I have a single node cluster for development.  I know I can increase performance if I add nodes but I'd like to understand this before I do.

My documents are small ~200 bytes.  With an index of 30k and rows limited to 1k, no problems.  I added 100k documents, and then I hit the too_many_results error.  Since I still have my row limit set at 1k, this indicates that the query does not return as soon as it finds the first 1k hits.  Is there a way to short circuit my queries so that they don't have to scan the whole index?

I got around too_many_results by increasing my max_search_results (I read https://help.basho.com/entries/480664-i-get-the-error-too-many-results).  I wonder, though, if I'll keep bumping memory boundaries as I add a few million docs to my index.

Thanks,
Daniel
<ATT00001..txt>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110414/2b2c5a5d/attachment.html>


More information about the riak-users mailing list