Yokozuna inconsistent search results

Oleksiy Krivoshey oleksiyk at gmail.com
Tue Apr 5 15:54:24 EDT 2016


Hi Fred,

Thanks for internal call tips, I'll dig deeper!

I've attached recent results of `riak-admin search aae-status` from all
nodes.


On 5 April 2016 at 22:41, Fred Dushin <fdushin at basho.com> wrote:

> Hi Oleksiy,
>
> I assume you are getting this information through riak-admin.  Can you
> post the results here?
>
> If you want to dig deeper, you can probe the individual hash trees for
> their build time.  I will paste a few snippets of erlang here, which I am
> hoping you can extend to use with list comprehensions and rpc:multicalls.
> If that's too much to ask, let us know and I can try to put something
> together that is more "big easy button".
>
> First, on any individual node, you can get the Riak partitions on that
> node, via
>
> (dev1 at 127.0.0.1)1> Partitons = [P || {_, P, _} <-
> riak_core_vnode_manager:all_vnodes(riak_kv_vnode)].
> [913438523331814323877303020447676887284957839360,
>  182687704666362864775460604089535377456991567872,
>  1187470080331358621040493926581979953470445191168,
>  730750818665451459101842416358141509827966271488,
>  1370157784997721485815954530671515330927436759040,
>  1004782375664995756265033322492444576013453623296,
>  822094670998632891489572718402909198556462055424,
>  456719261665907161938651510223838443642478919680,
>  274031556999544297163190906134303066185487351808,
>  1096126227998177188652763624537212264741949407232,
>  365375409332725729550921208179070754913983135744,
>  91343852333181432387730302044767688728495783936,
>  639406966332270026714112114313373821099470487552,0,
>  1278813932664540053428224228626747642198940975104,
>  548063113999088594326381812268606132370974703616]
>
> For any one partition, you can get to the Pid associated with the
> yz_index_hashtree associated with that partition, e.g.,
>
> (dev1 at 127.0.0.1)2> {ok, Pid} =
> yz_entropy_mgr:get_tree(913438523331814323877303020447676887284957839360).
> {ok,<0.2872.0>}
>
> and from there you can get the state information about the hahstree, which
> includes its build time.  You can read the record definitions associated
> with the yz_index_hashtree state by calling rr() on the yz_index_hashtree
> module first, if you want to make the state slightly more readable:
>
> (dev1 at 127.0.0.1)3> rr(yz_index_hashtree).
> [entropy_data,state,xmerl_event,xmerl_fun_states,
>  xmerl_scanner,xmlAttribute,xmlComment,xmlContext,xmlDecl,
>  xmlDocument,xmlElement,xmlNamespace,xmlNode,xmlNsNode,
>  xmlObj,xmlPI,xmlText]
> (dev1 at 127.0.0.1)5> sys:get_state(Pid).
> #state{index = 913438523331814323877303020447676887284957839360,
>        built = true,expired = false,lock = undefined,
>        path =
> "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
>        build_time = {1459,801655,506719},
>        trees = [{{867766597165223607683437869425293042920709947392,
>                   3},
>                  {state,<<152,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
>
>   913438523331814323877303020447676887284957839360,3,1048576,
>                         1024,0,
>                         {dict,0,16,16,8,80,48,{[],[],...},{{...}}},
>                         <<>>,
>
>   "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
>                         <<>>,incremental,[],0,
>                         {array,38837,0,...}}},
>                 {{890602560248518965780370444936484965102833893376,3},
>                  {state,<<156,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
>
>   913438523331814323877303020447676887284957839360,3,1048576,
>                         1024,0,
>                         {dict,0,16,16,8,80,48,{[],...},{...}},
>                         <<>>,
>
>   "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
>                         <<>>,incremental,[],0,
>                         {array,38837,...}}},
>                 {{913438523331814323877303020447676887284957839360,3},
>                  {state,<<160,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
>
>   913438523331814323877303020447676887284957839360,3,1048576,
>                         1024,0,
>                         {dict,0,16,16,8,80,48,{...},...},
>                         <<>>,
>
>   "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
>                         <<>>,incremental,[],0,
>                         {array,...}}}],
>        closed = false}
>
> You can convert the timestamp to local time via:
>
> (dev1 at 127.0.0.1)8> calendar:now_to_local_time({1459,801655,506719}).
> {{2016,4,4},{16,27,35}}
>
>
> Again, this is just an example, but with the right erlang incantations,
> you should be able to iterate over all the timestamps across all the nodes.
>
> Let us know if that is helpful, or if you need more examples so you can do
> it in one swipe.
>
> -Fred
>
> On Apr 5, 2016, at 9:29 AM, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>
> How can I check that AAE trees have expired? Yesterday I ran "
>  riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [],
> 5000)." on each node (just to be sure). Still today I see that on 3 nodes
> (of 5) all entropy tress and all last AAE exchanges are older than 20 days.
>
> On 4 April 2016 at 17:15, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>
>> Continuation...
>>
>> The new index has the same inconsistent search results problem.
>> I was making a snapshot of `search aae-status` command almost each day.
>> There were absolutely no Yokozuna errors in logs.
>>
>> I can see that some AAE trees were not expired (built > 20 days ago). I
>> can also see that on two nodes (of 5) last AAE exchanges happened > 20 days
>> ago.
>>
>> For now I have issued
>> ` riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [],
>> 5000).` on each node again. I will wait 10 days more but I don't think that
>> will fix anything.
>>
>>
>> On 25 March 2016 at 09:28, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>>
>>> One interesting moment happened when I tried removing the index:
>>>
>>> - this index was associated with a bucket type, called fs_chunks
>>> - so I first called RpbSetBucketTypeReq to set search_index: _dont_index_
>>> - i then tried to remove the index with RpbYokozunaIndexDeleteReq which
>>> failed with "index is in use" and list of all buckets of the fs_chunks type
>>> - for some reason all these buckets had their own search_index property
>>> set to that same index
>>>
>>> How can this happen if I definitely never set the search_index property
>>> per bucket?
>>>
>>> On 24 March 2016 at 22:41, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>>>
>>>> OK!
>>>>
>>>> On 24 March 2016 at 21:11, Magnus Kessler <mkessler at basho.com> wrote:
>>>>
>>>>> Hi Oleksiy,
>>>>>
>>>>> On 24 March 2016 at 14:55, Oleksiy Krivoshey <oleksiyk at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Magnus,
>>>>>>
>>>>>> Thanks! I guess I will go with index deletion because I've already
>>>>>> tried expiring the trees before.
>>>>>>
>>>>>> Do I need to delete AAE data somehow or removing the index is enough?
>>>>>>
>>>>>
>>>>> If you expire the AAE trees with the commands I posted earlier, there
>>>>> should be no need to remove the AAE data directories manually.
>>>>>
>>>>> I hope this works for you. Please monitor the tree rebuild and
>>>>> exchanges with `riak-admin search aae-status` for the next few days. In
>>>>> particular the exchanges should be ongoing on a continuous basis once all
>>>>> trees have been rebuilt. If they don't, please let me know. At that point
>>>>> you should also gather `riak-debug` output from all nodes before it gets
>>>>> rotated out after 5 days by default.
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Magnus
>>>>>
>>>>>
>>>>>>
>>>>>> On 24 March 2016 at 13:28, Magnus Kessler <mkessler at basho.com> wrote:
>>>>>>
>>>>>>> Hi Oleksiy,
>>>>>>>
>>>>>>> As a first step, I suggest to simply expire the Yokozuna AAE trees
>>>>>>> again if the output of `riak-admin search aae-status` still suggests that
>>>>>>> no recent exchanges have taken place. To do this, run `riak attach` on one
>>>>>>> node and then
>>>>>>>
>>>>>>> riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], 5000).
>>>>>>>
>>>>>>>
>>>>>>> Exit from the riak console with `Ctrl+G q`.
>>>>>>>
>>>>>>> Depending on your settings and amount of data the full index should
>>>>>>> be rebuilt within the next 2.5 days (for a cluster with ring size 128 and
>>>>>>> default settings). You can monitor the progress with `riak-admin search
>>>>>>> aae-status` and also in the logs, which should have messages along the
>>>>>>> lines of
>>>>>>>
>>>>>>> 2016-03-24 10:28:25.372 [info]
>>>>>>> <0.4647.6477>@yz_exchange_fsm:key_exchange:179 Repaired 83055 keys during
>>>>>>> active anti-entropy exchange of partition
>>>>>>> 1210306043414653979137426502093171875652569137152 for preflist
>>>>>>> {1164634117248063262943561351070788031288321245184,3}
>>>>>>>
>>>>>>>
>>>>>>> Re-indexing can put additional strain on the cluster and may cause
>>>>>>> elevated latency on a cluster already under heavy load. Please monitor the
>>>>>>> response times while the cluster is re-indexing data.
>>>>>>>
>>>>>>> If the cluster load allows it, you can force more rapid re-indexing
>>>>>>> by changing a few parameters. Again at the `riak attach` console, run
>>>>>>>
>>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_build_limit, {4, 60000}], 5000).
>>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_concurrency, 5], 5000).
>>>>>>>
>>>>>>> This will allow up to 4 trees per node to be built/exchanged per
>>>>>>> hour, with up to 5 concurrent exchanges throughout the cluster. To return
>>>>>>> back to the default settings, use
>>>>>>>
>>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_build_limit, {1, 360000}], 5000).
>>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_concurrency, 2], 5000).
>>>>>>>
>>>>>>>
>>>>>>> If the cluster still doesn't make any progress with automatically
>>>>>>> re-indexing data, the next steps are pretty much what you already
>>>>>>> suggested, to drop the existing index and re-index from scratch. I'm
>>>>>>> assuming that losing the indexes temporarily is acceptable to you at this
>>>>>>> point.
>>>>>>>
>>>>>>> Using any client API that supports RpbYokozunaIndexDeleteReq, you
>>>>>>> can drop the index from all Solr instances, losing any data stored there
>>>>>>> immediately. Next, you'll have to re-create the index. I have tried this
>>>>>>> with the python API, where I deleted the index and re-created it with the
>>>>>>> same already uploaded schema:
>>>>>>>
>>>>>>> from riak import RiakClient
>>>>>>>
>>>>>>> c = RiakClient()
>>>>>>> c.delete_search_index('my_index')
>>>>>>> c.create_search_index('my_index', 'my_schema')
>>>>>>>
>>>>>>> Note that simply deleting the index does not remove it's existing
>>>>>>> association with any bucket or bucket type. Any PUT operations on these
>>>>>>> buckets will lead to indexing failures being logged until the index has
>>>>>>> been recreated. However, this also means that no separate operation in
>>>>>>> `riak-admin` is required to associate the newly recreated index with the
>>>>>>> buckets again.
>>>>>>>
>>>>>>> After recreating the index expire the trees as explained previously.
>>>>>>>
>>>>>>> Let us know if this solves your issue.
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>>
>>>>>>> Magnus
>>>>>>>
>>>>>>>
>>>>>>> On 24 March 2016 at 08:44, Oleksiy Krivoshey <oleksiyk at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> This is how things are looking after two weeks:
>>>>>>>>
>>>>>>>> - there are no solr indexing issues for a long period (2 weeks)
>>>>>>>> - there are no yokozuna errors at all for 2 weeks
>>>>>>>> - there is an index with all empty schema, just _yz_* fields,
>>>>>>>> objects stored in a bucket(s) are binary and so are not analysed by yokozuna
>>>>>>>> - same yokozuna query repeated gives different number
>>>>>>>> for num_found, typically the difference between real number of keys in a
>>>>>>>> bucket and num_found is about 25%
>>>>>>>> - number of keys repaired by AAE (according to logs) is about 1-2
>>>>>>>> per few hours (number of keys "missing" in index is close to 1,000,000)
>>>>>>>>
>>>>>>>> Should I now try to delete the index and yokozuna AAE data and wait
>>>>>>>> another 2 weeks? If yes - how should I delete the index and AAE data?
>>>>>>>> Will RpbYokozunaIndexDeleteReq be enough?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Magnus Kessler
>>>>>>> Client Services Engineer
>>>>>>> Basho Technologies Limited
>>>>>>>
>>>>>>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg
>>>>>>> 07970431
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Magnus Kessler
>>>>> Client Services Engineer
>>>>> Basho Technologies Limited
>>>>>
>>>>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
>>>>>
>>>>
>>>>
>>>
>>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160405/57bbd06a/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: search-aae.tar.gz
Type: application/x-gzip
Size: 5936 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160405/57bbd06a/attachment-0002.gz>


More information about the riak-users mailing list