Yokozuna inconsistent search results

Oleksiy Krivoshey oleksiyk at gmail.com
Thu Apr 7 11:08:48 EDT 2016


So after 2 more days I can still see AAE trees that haven't been rebuilt
for 30 days, I can also see that some trees didn't have exchanges for the
same period.
I still have inconsistent search results from Yokozuna. To summarise this
long discussion:

- I have fixed all Solr indexing issues though none of them was related to
the search index in question
- there were no Solr indexing issues for 30 days
- search schema for this index doesn't have any fields beside required _yk_*
- I have dropped and re-created this search index two times
- I have tried expiring AAE trees 3 times
- some (quite a lot) of AAE trees don't have exchanges and are not rebuilt

Last output of `search aae-status` from all nodes attached.


On 5 April 2016 at 22:54, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:

> Hi Fred,
>
> Thanks for internal call tips, I'll dig deeper!
>
> I've attached recent results of `riak-admin search aae-status` from all
> nodes.
>
>
> On 5 April 2016 at 22:41, Fred Dushin <fdushin at basho.com> wrote:
>
>> Hi Oleksiy,
>>
>> I assume you are getting this information through riak-admin.  Can you
>> post the results here?
>>
>> If you want to dig deeper, you can probe the individual hash trees for
>> their build time.  I will paste a few snippets of erlang here, which I am
>> hoping you can extend to use with list comprehensions and rpc:multicalls.
>> If that's too much to ask, let us know and I can try to put something
>> together that is more "big easy button".
>>
>> First, on any individual node, you can get the Riak partitions on that
>> node, via
>>
>> (dev1 at 127.0.0.1)1> Partitons = [P || {_, P, _} <-
>> riak_core_vnode_manager:all_vnodes(riak_kv_vnode)].
>> [913438523331814323877303020447676887284957839360,
>>  182687704666362864775460604089535377456991567872,
>>  1187470080331358621040493926581979953470445191168,
>>  730750818665451459101842416358141509827966271488,
>>  1370157784997721485815954530671515330927436759040,
>>  1004782375664995756265033322492444576013453623296,
>>  822094670998632891489572718402909198556462055424,
>>  456719261665907161938651510223838443642478919680,
>>  274031556999544297163190906134303066185487351808,
>>  1096126227998177188652763624537212264741949407232,
>>  365375409332725729550921208179070754913983135744,
>>  91343852333181432387730302044767688728495783936,
>>  639406966332270026714112114313373821099470487552,0,
>>  1278813932664540053428224228626747642198940975104,
>>  548063113999088594326381812268606132370974703616]
>>
>> For any one partition, you can get to the Pid associated with the
>> yz_index_hashtree associated with that partition, e.g.,
>>
>> (dev1 at 127.0.0.1)2> {ok, Pid} =
>> yz_entropy_mgr:get_tree(913438523331814323877303020447676887284957839360).
>> {ok,<0.2872.0>}
>>
>> and from there you can get the state information about the hahstree,
>> which includes its build time.  You can read the record definitions
>> associated with the yz_index_hashtree state by calling rr() on the
>> yz_index_hashtree module first, if you want to make the state slightly more
>> readable:
>>
>> (dev1 at 127.0.0.1)3> rr(yz_index_hashtree).
>> [entropy_data,state,xmerl_event,xmerl_fun_states,
>>  xmerl_scanner,xmlAttribute,xmlComment,xmlContext,xmlDecl,
>>  xmlDocument,xmlElement,xmlNamespace,xmlNode,xmlNsNode,
>>  xmlObj,xmlPI,xmlText]
>> (dev1 at 127.0.0.1)5> sys:get_state(Pid).
>> #state{index = 913438523331814323877303020447676887284957839360,
>>        built = true,expired = false,lock = undefined,
>>        path =
>> "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
>>        build_time = {1459,801655,506719},
>>        trees = [{{867766597165223607683437869425293042920709947392,
>>                   3},
>>                  {state,<<152,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
>>
>>   913438523331814323877303020447676887284957839360,3,1048576,
>>                         1024,0,
>>                         {dict,0,16,16,8,80,48,{[],[],...},{{...}}},
>>                         <<>>,
>>
>>   "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
>>                         <<>>,incremental,[],0,
>>                         {array,38837,0,...}}},
>>                 {{890602560248518965780370444936484965102833893376,3},
>>                  {state,<<156,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
>>
>>   913438523331814323877303020447676887284957839360,3,1048576,
>>                         1024,0,
>>                         {dict,0,16,16,8,80,48,{[],...},{...}},
>>                         <<>>,
>>
>>   "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
>>                         <<>>,incremental,[],0,
>>                         {array,38837,...}}},
>>                 {{913438523331814323877303020447676887284957839360,3},
>>                  {state,<<160,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
>>
>>   913438523331814323877303020447676887284957839360,3,1048576,
>>                         1024,0,
>>                         {dict,0,16,16,8,80,48,{...},...},
>>                         <<>>,
>>
>>   "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
>>                         <<>>,incremental,[],0,
>>                         {array,...}}}],
>>        closed = false}
>>
>> You can convert the timestamp to local time via:
>>
>> (dev1 at 127.0.0.1)8> calendar:now_to_local_time({1459,801655,506719}).
>> {{2016,4,4},{16,27,35}}
>>
>>
>> Again, this is just an example, but with the right erlang incantations,
>> you should be able to iterate over all the timestamps across all the nodes.
>>
>> Let us know if that is helpful, or if you need more examples so you can
>> do it in one swipe.
>>
>> -Fred
>>
>> On Apr 5, 2016, at 9:29 AM, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>>
>> How can I check that AAE trees have expired? Yesterday I ran "
>>  riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [],
>> 5000)." on each node (just to be sure). Still today I see that on 3 nodes
>> (of 5) all entropy tress and all last AAE exchanges are older than 20 days.
>>
>> On 4 April 2016 at 17:15, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>>
>>> Continuation...
>>>
>>> The new index has the same inconsistent search results problem.
>>> I was making a snapshot of `search aae-status` command almost each day.
>>> There were absolutely no Yokozuna errors in logs.
>>>
>>> I can see that some AAE trees were not expired (built > 20 days ago). I
>>> can also see that on two nodes (of 5) last AAE exchanges happened > 20 days
>>> ago.
>>>
>>> For now I have issued
>>> ` riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [],
>>> 5000).` on each node again. I will wait 10 days more but I don't think that
>>> will fix anything.
>>>
>>>
>>> On 25 March 2016 at 09:28, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>>>
>>>> One interesting moment happened when I tried removing the index:
>>>>
>>>> - this index was associated with a bucket type, called fs_chunks
>>>> - so I first called RpbSetBucketTypeReq to set search_index:
>>>> _dont_index_
>>>> - i then tried to remove the index with RpbYokozunaIndexDeleteReq which
>>>> failed with "index is in use" and list of all buckets of the fs_chunks type
>>>> - for some reason all these buckets had their own search_index property
>>>> set to that same index
>>>>
>>>> How can this happen if I definitely never set the search_index property
>>>> per bucket?
>>>>
>>>> On 24 March 2016 at 22:41, Oleksiy Krivoshey <oleksiyk at gmail.com>
>>>> wrote:
>>>>
>>>>> OK!
>>>>>
>>>>> On 24 March 2016 at 21:11, Magnus Kessler <mkessler at basho.com> wrote:
>>>>>
>>>>>> Hi Oleksiy,
>>>>>>
>>>>>> On 24 March 2016 at 14:55, Oleksiy Krivoshey <oleksiyk at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Magnus,
>>>>>>>
>>>>>>> Thanks! I guess I will go with index deletion because I've already
>>>>>>> tried expiring the trees before.
>>>>>>>
>>>>>>> Do I need to delete AAE data somehow or removing the index is enough?
>>>>>>>
>>>>>>
>>>>>> If you expire the AAE trees with the commands I posted earlier, there
>>>>>> should be no need to remove the AAE data directories manually.
>>>>>>
>>>>>> I hope this works for you. Please monitor the tree rebuild and
>>>>>> exchanges with `riak-admin search aae-status` for the next few days. In
>>>>>> particular the exchanges should be ongoing on a continuous basis once all
>>>>>> trees have been rebuilt. If they don't, please let me know. At that point
>>>>>> you should also gather `riak-debug` output from all nodes before it gets
>>>>>> rotated out after 5 days by default.
>>>>>>
>>>>>> Kind Regards,
>>>>>>
>>>>>> Magnus
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On 24 March 2016 at 13:28, Magnus Kessler <mkessler at basho.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Oleksiy,
>>>>>>>>
>>>>>>>> As a first step, I suggest to simply expire the Yokozuna AAE trees
>>>>>>>> again if the output of `riak-admin search aae-status` still suggests that
>>>>>>>> no recent exchanges have taken place. To do this, run `riak attach` on one
>>>>>>>> node and then
>>>>>>>>
>>>>>>>> riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], 5000).
>>>>>>>>
>>>>>>>>
>>>>>>>> Exit from the riak console with `Ctrl+G q`.
>>>>>>>>
>>>>>>>> Depending on your settings and amount of data the full index should
>>>>>>>> be rebuilt within the next 2.5 days (for a cluster with ring size 128 and
>>>>>>>> default settings). You can monitor the progress with `riak-admin search
>>>>>>>> aae-status` and also in the logs, which should have messages along the
>>>>>>>> lines of
>>>>>>>>
>>>>>>>> 2016-03-24 10:28:25.372 [info]
>>>>>>>> <0.4647.6477>@yz_exchange_fsm:key_exchange:179 Repaired 83055 keys during
>>>>>>>> active anti-entropy exchange of partition
>>>>>>>> 1210306043414653979137426502093171875652569137152 for preflist
>>>>>>>> {1164634117248063262943561351070788031288321245184,3}
>>>>>>>>
>>>>>>>>
>>>>>>>> Re-indexing can put additional strain on the cluster and may cause
>>>>>>>> elevated latency on a cluster already under heavy load. Please monitor the
>>>>>>>> response times while the cluster is re-indexing data.
>>>>>>>>
>>>>>>>> If the cluster load allows it, you can force more rapid re-indexing
>>>>>>>> by changing a few parameters. Again at the `riak attach` console, run
>>>>>>>>
>>>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_build_limit, {4, 60000}], 5000).
>>>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_concurrency, 5], 5000).
>>>>>>>>
>>>>>>>> This will allow up to 4 trees per node to be built/exchanged per
>>>>>>>> hour, with up to 5 concurrent exchanges throughout the cluster. To return
>>>>>>>> back to the default settings, use
>>>>>>>>
>>>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_build_limit, {1, 360000}], 5000).
>>>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_concurrency, 2], 5000).
>>>>>>>>
>>>>>>>>
>>>>>>>> If the cluster still doesn't make any progress with automatically
>>>>>>>> re-indexing data, the next steps are pretty much what you already
>>>>>>>> suggested, to drop the existing index and re-index from scratch. I'm
>>>>>>>> assuming that losing the indexes temporarily is acceptable to you at this
>>>>>>>> point.
>>>>>>>>
>>>>>>>> Using any client API that supports RpbYokozunaIndexDeleteReq, you
>>>>>>>> can drop the index from all Solr instances, losing any data stored there
>>>>>>>> immediately. Next, you'll have to re-create the index. I have tried this
>>>>>>>> with the python API, where I deleted the index and re-created it with the
>>>>>>>> same already uploaded schema:
>>>>>>>>
>>>>>>>> from riak import RiakClient
>>>>>>>>
>>>>>>>> c = RiakClient()
>>>>>>>> c.delete_search_index('my_index')
>>>>>>>> c.create_search_index('my_index', 'my_schema')
>>>>>>>>
>>>>>>>> Note that simply deleting the index does not remove it's existing
>>>>>>>> association with any bucket or bucket type. Any PUT operations on these
>>>>>>>> buckets will lead to indexing failures being logged until the index has
>>>>>>>> been recreated. However, this also means that no separate operation in
>>>>>>>> `riak-admin` is required to associate the newly recreated index with the
>>>>>>>> buckets again.
>>>>>>>>
>>>>>>>> After recreating the index expire the trees as explained previously.
>>>>>>>>
>>>>>>>> Let us know if this solves your issue.
>>>>>>>>
>>>>>>>> Kind Regards,
>>>>>>>>
>>>>>>>> Magnus
>>>>>>>>
>>>>>>>>
>>>>>>>> On 24 March 2016 at 08:44, Oleksiy Krivoshey <oleksiyk at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> This is how things are looking after two weeks:
>>>>>>>>>
>>>>>>>>> - there are no solr indexing issues for a long period (2 weeks)
>>>>>>>>> - there are no yokozuna errors at all for 2 weeks
>>>>>>>>> - there is an index with all empty schema, just _yz_* fields,
>>>>>>>>> objects stored in a bucket(s) are binary and so are not analysed by yokozuna
>>>>>>>>> - same yokozuna query repeated gives different number
>>>>>>>>> for num_found, typically the difference between real number of keys in a
>>>>>>>>> bucket and num_found is about 25%
>>>>>>>>> - number of keys repaired by AAE (according to logs) is about 1-2
>>>>>>>>> per few hours (number of keys "missing" in index is close to 1,000,000)
>>>>>>>>>
>>>>>>>>> Should I now try to delete the index and yokozuna AAE data and
>>>>>>>>> wait another 2 weeks? If yes - how should I delete the index and AAE data?
>>>>>>>>> Will RpbYokozunaIndexDeleteReq be enough?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Magnus Kessler
>>>>>>>> Client Services Engineer
>>>>>>>> Basho Technologies Limited
>>>>>>>>
>>>>>>>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg
>>>>>>>> 07970431
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Magnus Kessler
>>>>>> Client Services Engineer
>>>>>> Basho Technologies Limited
>>>>>>
>>>>>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg
>>>>>> 07970431
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160407/b4ea0e8b/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: search-aae.tar.gz
Type: application/x-gzip
Size: 5904 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160407/b4ea0e8b/attachment-0002.gz>


More information about the riak-users mailing list