Yokozuna inconsistent search results

Oleksiy Krivoshey oleksiyk at gmail.com
Mon Mar 14 05:45:20 EDT 2016


I would like to continue as this seems to me like a serious problem, on a
bucket with 700,000 keys the difference in num_found can be up to 200,000!
And thats a search index that doesn't index, analyse or store ANY of the
document fields, the schema has only required _yz_* fields and nothing else.

I have tried deleting the search index (with PBC call) and tried expiring
AAE trees. Nothing helps. I can't get consistent search results from
Yokozuna.

Please help.

On 11 March 2016 at 18:18, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:

> Hi Fred,
>
> This is production environment but I can delete the index. However this
> index covers ~3500 buckets and there are probably 10,000,000 keys.
>
> The index was created after the buckets. The schema for the index is just
> the basic required fields (_yz_*) and nothing else.
>
> Yes, I'm willing to resolve this. When you say to delete chunks_index, do
> you mean the simple RpbYokozunaIndexDeleteReq or something else is required?
>
> Thanks!
>
>
>
>
> On 11 March 2016 at 17:08, Fred Dushin <fdushin at basho.com> wrote:
>
>> Hi Oleksiy,
>>
>> This is definitely pointing to an issue either in the coverage plan
>> (which determines the distributed query you are seeing) or in the data you
>> have in Solr.  I am wondering if it is possible that you have some data in
>> Solr that is causing the rebuild of the YZ AAE tree to incorrectly
>> represent what is actually stored in Solr.
>>
>> What you did was to manually expire the YZ (Riak Search) AAE trees, which
>> caused them to rebuild from the entropy data stored in Solr.  Another thing
>> we could try (if you are willing) would be to delete the 'chunks_index'
>> data in Solr (as well as the Yokozuna AAE data), and then let AAE repair
>> the missing data.  What Riak will essentially do is compare the KV hash
>> trees with the YZ hash trees (which will be empty), too that it is missing
>> in Solr, and add it to Solr, as a result.  This would effectively result in
>> re-indexing all of your data, but we are only talking about ~30k entries
>> (times 3, presumably, if your n_val is 3), so that shouldn't take too much
>> time, I wouldn't think.  There is even some configuration you can use to
>> accelerate this process, if necessary.
>>
>> Is that something you would be willing to try?  It would result in down
>> time on query.  Is this production data or a test environment?
>>
>> -Fred
>>
>> On Mar 11, 2016, at 7:38 AM, Oleksiy Krivoshey <oleksiyk at gmail.com>
>> wrote:
>>
>> Here are two consequent requests, one returns 30118 keys, another 37134
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <response>
>>   <lst name="responseHeader">
>>     <int name="status">0</int>
>>     <int name="QTime">6</int>
>>     <lst name="params">
>>       <str name="10.0.1.3:8093">_yz_pn:92 OR _yz_pn:83 OR _yz_pn:71 OR
>> _yz_pn:59 OR _yz_pn:50 OR _yz_pn:38 OR _yz_pn:17 OR _yz_pn:5</str>
>>       <str name="10.0.1.2:8093">_yz_pn:122 OR _yz_pn:110 OR _yz_pn:98 OR
>> _yz_pn:86 OR _yz_pn:74 OR _yz_pn:62 OR _yz_pn:26 OR _yz_pn:14 OR
>> _yz_pn:2</str>
>>       <str name="shards">
>> 10.0.1.1:8093/internal_solr/chunks_index,10.0.1.2:8093/internal_solr/chunks_index,10.0.1.3:8093/internal_solr/chunks_index,10.0.1.4:8093/internal_solr/chunks_index,10.0.1.5:8093/internal_solr/chunks_index
>> </str>
>>       <str name="q">_yz_rb:0dmid2ilpyrfiuaqtvnc482f1esdchb5.chunks</str>
>>       <str name="10.0.1.5:8093">(_yz_pn:124 AND (_yz_fpn:124 OR
>> _yz_fpn:123)) OR _yz_pn:116 OR _yz_pn:104 OR _yz_pn:80 OR _yz_pn:68 OR
>> _yz_pn:56 OR _yz_pn:44 OR _yz_pn:32 OR _yz_pn:20 OR _yz_pn:8</str>
>>       <str name="10.0.1.1:8093">_yz_pn:113 OR _yz_pn:101 OR _yz_pn:89 OR
>> _yz_pn:77 OR _yz_pn:65 OR _yz_pn:53 OR _yz_pn:41 OR _yz_pn:29</str>
>>       <str name="10.0.1.4:8093">_yz_pn:127 OR _yz_pn:119 OR _yz_pn:107
>> OR _yz_pn:95 OR _yz_pn:47 OR _yz_pn:35 OR _yz_pn:23 OR _yz_pn:11</str>
>>       <str name="rows">0</str>
>>     </lst>
>>   </lst>
>>   <result maxScore="6.364349" name="response" numFound="30118"
>> start="0"></result>
>> </response>
>>
>> ------
>>
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <response>
>>   <lst name="responseHeader">
>>     <int name="status">0</int>
>>     <int name="QTime">10</int>
>>     <lst name="params">
>>       <str name="10.0.1.3:8093">_yz_pn:100 OR _yz_pn:88 OR _yz_pn:79 OR
>> _yz_pn:67 OR _yz_pn:46 OR _yz_pn:34 OR _yz_pn:25 OR _yz_pn:13 OR
>> _yz_pn:1</str>
>>       <str name="10.0.1.2:8093">(_yz_pn:126 AND (_yz_fpn:126 OR
>> _yz_fpn:125)) OR _yz_pn:118 OR _yz_pn:106 OR _yz_pn:94 OR _yz_pn:82 OR
>> _yz_pn:70 OR _yz_pn:58 OR _yz_pn:22 OR _yz_pn:10</str>
>>       <str name="shards">
>> 10.0.1.1:8093/internal_solr/chunks_index,10.0.1.2:8093/internal_solr/chunks_index,10.0.1.3:8093/internal_solr/chunks_index,10.0.1.4:8093/internal_solr/chunks_index,10.0.1.5:8093/internal_solr/chunks_index
>> </str>
>>       <str name="q">_yz_rb:0dmid2ilpyrfiuaqtvnc482f1esdchb5.chunks</str>
>>       <str name="10.0.1.5:8093">_yz_pn:124 OR _yz_pn:112 OR _yz_pn:76 OR
>> _yz_pn:64 OR _yz_pn:52 OR _yz_pn:40 OR _yz_pn:28 OR _yz_pn:16 OR
>> _yz_pn:4</str>
>>       <str name="10.0.1.1:8093">_yz_pn:121 OR _yz_pn:109 OR _yz_pn:97 OR
>> _yz_pn:85 OR _yz_pn:73 OR _yz_pn:61 OR _yz_pn:49 OR _yz_pn:37</str>
>>       <str name="10.0.1.4:8093">_yz_pn:115 OR _yz_pn:103 OR _yz_pn:91 OR
>> _yz_pn:55 OR _yz_pn:43 OR _yz_pn:31 OR _yz_pn:19 OR _yz_pn:7</str>
>>       <str name="rows">0</str>
>>     </lst>
>>   </lst>
>>   <result maxScore="6.364349" name="response" numFound="37134"
>> start="0"></result>
>> </response>
>>
>> On 11 March 2016 at 12:05, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>>
>>> So event when I fixed 3 documents which caused AAE errors,
>>> restarted AAE with riak_core_util:rpc_every_member_ann(yz_entropy_mgr,
>>> expire_trees, [], 5000).
>>> waited 5 days (now I see all AAE trees rebuilt in last 5 days and no AAE
>>> or Solr errors), I still get inconsistent num_found.
>>>
>>> For a bucket with 30,000 keys each new search request can result in
>>> difference in num_found for over 5,000.
>>>
>>> What else can I do to get consistent index, or at least not a 15%
>>> difference.
>>>
>>> I even tried to walk through all the bucket keys and modifying them in a
>>> hope that all Yokozuna instances in a cluster will pick them up, but no
>>> luck.
>>>
>>> Thanks!
>>>>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160314/b76514c7/attachment-0002.html>


More information about the riak-users mailing list