Score for each search

Archana Bhattarai ABhattarai at sharecare.com
Fri Aug 5 11:54:35 EDT 2011


Hi Rusty,

Thanks for the answer.

We have indexed the following json object:


{
    "@class": "com.starsite.data.Answer",
    "answer_text": "momo is the best nepalese food",
    "keywords": null,
    "metaDescription": null,
    "post_date": null,
    "id": "202ba4ac-0fd3-4709-ba84-463e0caa413c",
    "version": 1,
    "scope": [
        "type|com.starsite.data.Answer"
    ]
}

we issued the following query:

answer_text: "food"

and the data we got in keydata was as follows:

[{"p":[4,0],"score":[4.855199135883779,1.8398742574541822]}]


What does 0-indexing mean ? If the scoring in riak-search is done based on vector-space model like in lucene, I was expecting the scores to be normalized between 0 and 1.

In case of position information, I assume the words 'is' and 'the' are removed as part of stopwords removal. If they're not removed the position should have been 5. If they are removed, the position should have been 3. The word "food" occurs only once. Shouldn't we be getting just one position ?

Thanks,
Archana



On Aug 5, 2011, at 11:08 AM, Rusty Klophaus wrote:

Hi Archana,

Yes, the 'p' attribute is positional information. That list is indicating that the term occurs on the 0th and 43rd positions in the document, and is 0-indexed. Not sure why you are getting two positions if the word only occurred once. What was the original query?

The scoring information that you see is a bug. For now, as a workaround, you can add the scores together. This will give you a *relative* score, allowing you to rank results for the current query.

To fix this issue, some processing needs to happen within riak to combine and normalize the scores into a final score that can be used for correct ranking against other queries as well. (This is being done for the Solr interface, but not the Map/Reduce interface.) Riak Search models scoring after Lucene as much as possible, so you can read this for more information about scoring, especially the final normalization step: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

This issue is tracked in https://issues.basho.com/show_bug.cgi?id=1154

Best,
Rusty


On Thu, Aug 4, 2011 at 3:27 PM, Archana Bhattarai <ABhattarai at sharecare.com<mailto:ABhattarai at sharecare.com>> wrote:
Hi Rusty,

Thanks a lot for the answer. We could get some data in the keydata  as follows:


[{"p":[43,0],"score":[5.3669048584479,1.7201627119528418]}

But couldn't exactly interpret what it's representing. I believe p is giving positional information. But why is it two dimensional when the word we searched only occurred once in the document. Does the position ignore stopword positions and just count other words? Also why are there two scores ? Isn't the score normalized ? Or am I doing something wrong to get these scores ?


Thanks a lot in advance,
Archana


On Jul 22, 2011, at 11:09 AM, Rusty Klophaus wrote:

Hi Archana,

Yes. When you use a search query to initiate a map/reduce job, the scores are fed into the first phase as keydata, along with other metadata about the search result including positional information and any inline fields.

More information in the links below:

 *   http://wiki.basho.com/Riak-Search---Querying.html#Querying-Integrated-with-Map-Reduce
 *   http://wiki.basho.com/MapReduce.html (search for "keydata")

Best,
Rusty

On Fri, Jul 22, 2011 at 10:53 AM, Archana Bhattarai <ABhattarai at sharecare.com<mailto:ABhattarai at sharecare.com>> wrote:
Hi,

Is there a way to get back the score while querying via solr interface or ideally mapreduce over search ? It looks like solr interface only supports sorting.


Thanks in advance,
Archana
_______________________________________________
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



--
Rusty Klophaus

Basho Technologies, Inc.
11921 Freedom Drive, Suite 550
Reston, VA 20190
www.basho.com<http://www.basho.com/>





_______________________________________________
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




--
Rusty Klophaus

Basho Technologies, Inc.
11921 Freedom Drive, Suite 550
Reston, VA 20190
www.basho.com<http://www.basho.com/>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110805/5ee85ffc/attachment.html>


More information about the riak-users mailing list