riak search and solr/lucene
joseph.g.lambert at gmail.com
Thu Nov 4 20:02:58 EDT 2010
Disregard that last message. What I meant was, in a Solr query, all the
results are returned and then it sorts and then takes the chunk that is
requested by the start and count parameters. Why not instead make the
results of the search() function the input of a MapReduce job, and if the
user adds sorting and then start and count parameters, add two reduce jobs,
one a sort and one a slice. Would that not improve the Solr search results?
Or do I not understand correctly?
- Joe Lambert
joseph.g.lambert at gmail.com
On Fri, Nov 5, 2010 at 7:34 AM, Joseph Lambert
<joseph.g.lambert at gmail.com>wrote:
> Sorry, I meant Lucene search. Solr can be passed start and count, Lucene
> search can't be, but they share functions in the Erlang code.
> - Joe Lambert
> joseph.g.lambert at gmail.com
> +86 13656213284
> On Fri, Nov 5, 2010 at 2:15 AM, Rusty Klophaus <rusty at basho.com> wrote:
>> Hi Joseph,
>> Answers inline below.
>> On Thu, Nov 4, 2010 at 12:49 AM, Joseph Lambert <
>> joseph.g.lambert at gmail.com> wrote:
>>> I am using the PHP library for a project and was looking through the code
>>> to see what differentiates the Solr HTTP interface query versus the Lucene
>>> search (besides the syntax and the interface, etc) as paging is very useful
>>> for my code. From the PHP library with lucene I can do a search with lucene,
>>> then a reduce job to sort, then another reduce to slice the results. With
>>> Solr, we can just do a cURL with the parameters to do the same thing.
>>> I scanned the Erlang code, and in the end, both call stream_search(), but
>>> the Lucene query will pass the results back to luke for possibly another MR
>>> phase, and the Solr query simply sorts and truncates the list. So:
>>> 1. Does anyone have a general idea at what point the Solr query will
>>> start to get really slow as far as number of keys in a bucket and other
>>> factors? I know this is dependent on many things, just looking for a rough
>>> idea of when it's a bad idea to use the Solr interface.
>> The Solr interface works by running the query to find your list of keys
>> (limited based on the "start" and "rows" parameters) and then looking up the
>> keys in Riak KV. So if you execute a Solr request with "rows=100", your
>> request will take a certain amount of time to execute the query, plus
>> however long it takes to retrieve 100 objects in your cluster.
>>> 2. Also, I see that Riak will cache the map phase of a map reduce, so
>>> will it cache the initial search? Or does it use some other mechanism I'm
>>> not seeing to cache search results?
>> The system does not cache Search results, though the operating system's
>> disk caching does make repeated search results execute more quickly.
>>> 3. Finally, for the Solr query, why not automatically add a sort and/or
>>> slice phase if the user passes in sort, start or count parameters in the
>>> Solr query?
>> Not sure I understand the question here, can you clarify/elaborate? The
>> system does support sort and slice parameters. (
>>> Please correct me if any of the assumptions I made are wrong, as usually
>>> when I ask these questions I end up with my foot in my mouth.
>>> - Joe Lambert
>>> joseph.g.lambert at gmail.com
>>> +86 13656213284
>>> riak-users mailing list
>>> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users