Yokozuna inconsistent search results

Fred Dushin fdushin at basho.com
Tue Apr 5 15:41:54 EDT 2016


Hi Oleksiy,

I assume you are getting this information through riak-admin.  Can you post the results here?

If you want to dig deeper, you can probe the individual hash trees for their build time.  I will paste a few snippets of erlang here, which I am hoping you can extend to use with list comprehensions and rpc:multicalls.  If that's too much to ask, let us know and I can try to put something together that is more "big easy button".

First, on any individual node, you can get the Riak partitions on that node, via

(dev1 at 127.0.0.1)1> Partitons = [P || {_, P, _} <- riak_core_vnode_manager:all_vnodes(riak_kv_vnode)]. 
[913438523331814323877303020447676887284957839360,
 182687704666362864775460604089535377456991567872,
 1187470080331358621040493926581979953470445191168,
 730750818665451459101842416358141509827966271488,
 1370157784997721485815954530671515330927436759040,
 1004782375664995756265033322492444576013453623296,
 822094670998632891489572718402909198556462055424,
 456719261665907161938651510223838443642478919680,
 274031556999544297163190906134303066185487351808,
 1096126227998177188652763624537212264741949407232,
 365375409332725729550921208179070754913983135744,
 91343852333181432387730302044767688728495783936,
 639406966332270026714112114313373821099470487552,0,
 1278813932664540053428224228626747642198940975104,
 548063113999088594326381812268606132370974703616]

For any one partition, you can get to the Pid associated with the yz_index_hashtree associated with that partition, e.g.,

(dev1 at 127.0.0.1)2> {ok, Pid} = yz_entropy_mgr:get_tree(913438523331814323877303020447676887284957839360). 
{ok,<0.2872.0>}

and from there you can get the state information about the hahstree, which includes its build time.  You can read the record definitions associated with the yz_index_hashtree state by calling rr() on the yz_index_hashtree module first, if you want to make the state slightly more readable:

(dev1 at 127.0.0.1)3> rr(yz_index_hashtree).
[entropy_data,state,xmerl_event,xmerl_fun_states,
 xmerl_scanner,xmlAttribute,xmlComment,xmlContext,xmlDecl,
 xmlDocument,xmlElement,xmlNamespace,xmlNode,xmlNsNode,
 xmlObj,xmlPI,xmlText]
(dev1 at 127.0.0.1)5> sys:get_state(Pid).                      
#state{index = 913438523331814323877303020447676887284957839360,
       built = true,expired = false,lock = undefined,
       path = "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
       build_time = {1459,801655,506719},
       trees = [{{867766597165223607683437869425293042920709947392,
                  3},
                 {state,<<152,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
                        913438523331814323877303020447676887284957839360,3,1048576,
                        1024,0,
                        {dict,0,16,16,8,80,48,{[],[],...},{{...}}},
                        <<>>,
                        "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
                        <<>>,incremental,[],0,
                        {array,38837,0,...}}},
                {{890602560248518965780370444936484965102833893376,3},
                 {state,<<156,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
                        913438523331814323877303020447676887284957839360,3,1048576,
                        1024,0,
                        {dict,0,16,16,8,80,48,{[],...},{...}},
                        <<>>,
                        "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
                        <<>>,incremental,[],0,
                        {array,38837,...}}},
                {{913438523331814323877303020447676887284957839360,3},
                 {state,<<160,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
                        913438523331814323877303020447676887284957839360,3,1048576,
                        1024,0,
                        {dict,0,16,16,8,80,48,{...},...},
                        <<>>,
                        "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360",
                        <<>>,incremental,[],0,
                        {array,...}}}],
       closed = false}

You can convert the timestamp to local time via:

(dev1 at 127.0.0.1)8> calendar:now_to_local_time({1459,801655,506719}).
{{2016,4,4},{16,27,35}}

Again, this is just an example, but with the right erlang incantations, you should be able to iterate over all the timestamps across all the nodes.

Let us know if that is helpful, or if you need more examples so you can do it in one swipe.

-Fred

> On Apr 5, 2016, at 9:29 AM, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
> 
> How can I check that AAE trees have expired? Yesterday I ran " riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], 5000)." on each node (just to be sure). Still today I see that on 3 nodes (of 5) all entropy tress and all last AAE exchanges are older than 20 days.
> 
> On 4 April 2016 at 17:15, Oleksiy Krivoshey <oleksiyk at gmail.com <mailto:oleksiyk at gmail.com>> wrote:
> Continuation...
> 
> The new index has the same inconsistent search results problem. 
> I was making a snapshot of `search aae-status` command almost each day. There were absolutely no Yokozuna errors in logs. 
> 
> I can see that some AAE trees were not expired (built > 20 days ago). I can also see that on two nodes (of 5) last AAE exchanges happened > 20 days ago.
> 
> For now I have issued ` riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], 5000).` on each node again. I will wait 10 days more but I don't think that will fix anything. 
> 
> 
> On 25 March 2016 at 09:28, Oleksiy Krivoshey <oleksiyk at gmail.com <mailto:oleksiyk at gmail.com>> wrote:
> One interesting moment happened when I tried removing the index:
> 
> - this index was associated with a bucket type, called fs_chunks
> - so I first called RpbSetBucketTypeReq to set search_index: _dont_index_
> - i then tried to remove the index with RpbYokozunaIndexDeleteReq which failed with "index is in use" and list of all buckets of the fs_chunks type
> - for some reason all these buckets had their own search_index property set to that same index
> 
> How can this happen if I definitely never set the search_index property per bucket?
> 
> On 24 March 2016 at 22:41, Oleksiy Krivoshey <oleksiyk at gmail.com <mailto:oleksiyk at gmail.com>> wrote:
> OK!
> 
> On 24 March 2016 at 21:11, Magnus Kessler <mkessler at basho.com <mailto:mkessler at basho.com>> wrote:
> Hi Oleksiy,
> 
> On 24 March 2016 at 14:55, Oleksiy Krivoshey <oleksiyk at gmail.com <mailto:oleksiyk at gmail.com>> wrote:
> Hi Magnus,
> 
> Thanks! I guess I will go with index deletion because I've already tried expiring the trees before.
> 
> Do I need to delete AAE data somehow or removing the index is enough?
> 
> If you expire the AAE trees with the commands I posted earlier, there should be no need to remove the AAE data directories manually.
> 
> I hope this works for you. Please monitor the tree rebuild and exchanges with `riak-admin search aae-status` for the next few days. In particular the exchanges should be ongoing on a continuous basis once all trees have been rebuilt. If they don't, please let me know. At that point you should also gather `riak-debug` output from all nodes before it gets rotated out after 5 days by default.
> 
> Kind Regards,
> 
> Magnus
>  
> 
> On 24 March 2016 at 13:28, Magnus Kessler <mkessler at basho.com <mailto:mkessler at basho.com>> wrote:
> Hi Oleksiy,
> 
> As a first step, I suggest to simply expire the Yokozuna AAE trees again if the output of `riak-admin search aae-status` still suggests that no recent exchanges have taken place. To do this, run `riak attach` on one node and then
> 
> riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], 5000).
> 
> Exit from the riak console with `Ctrl+G q`.
> 
> Depending on your settings and amount of data the full index should be rebuilt within the next 2.5 days (for a cluster with ring size 128 and default settings). You can monitor the progress with `riak-admin search aae-status` and also in the logs, which should have messages along the lines of
> 
> 2016-03-24 10:28:25.372 [info] <0.4647.6477>@yz_exchange_fsm:key_exchange:179 Repaired 83055 keys during active anti-entropy exchange of partition 1210306043414653979137426502093171875652569137152 for preflist {1164634117248063262943561351070788031288321245184,3}
> 
> 
> Re-indexing can put additional strain on the cluster and may cause elevated latency on a cluster already under heavy load. Please monitor the response times while the cluster is re-indexing data.
> 
> If the cluster load allows it, you can force more rapid re-indexing by changing a few parameters. Again at the `riak attach` console, run
> 
> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_build_limit, {4, 60000}], 5000).
> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_concurrency, 5], 5000).
> This will allow up to 4 trees per node to be built/exchanged per hour, with up to 5 concurrent exchanges throughout the cluster. To return back to the default settings, use
> 
> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_build_limit, {1, 360000}], 5000).
> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_concurrency, 2], 5000).
> 
> If the cluster still doesn't make any progress with automatically re-indexing data, the next steps are pretty much what you already suggested, to drop the existing index and re-index from scratch. I'm assuming that losing the indexes temporarily is acceptable to you at this point.
> 
> Using any client API that supports RpbYokozunaIndexDeleteReq, you can drop the index from all Solr instances, losing any data stored there immediately. Next, you'll have to re-create the index. I have tried this with the python API, where I deleted the index and re-created it with the same already uploaded schema:
> 
> from riak import RiakClient
> 
> c = RiakClient()
> c.delete_search_index('my_index')
> c.create_search_index('my_index', 'my_schema')
> 
> Note that simply deleting the index does not remove it's existing association with any bucket or bucket type. Any PUT operations on these buckets will lead to indexing failures being logged until the index has been recreated. However, this also means that no separate operation in `riak-admin` is required to associate the newly recreated index with the buckets again.
> 
> After recreating the index expire the trees as explained previously.
> 
> Let us know if this solves your issue.
> 
> Kind Regards,
> 
> Magnus
> 
> 
> On 24 March 2016 at 08:44, Oleksiy Krivoshey <oleksiyk at gmail.com <mailto:oleksiyk at gmail.com>> wrote:
> This is how things are looking after two weeks:
> 
> - there are no solr indexing issues for a long period (2 weeks)
> - there are no yokozuna errors at all for 2 weeks
> - there is an index with all empty schema, just _yz_* fields, objects stored in a bucket(s) are binary and so are not analysed by yokozuna
> - same yokozuna query repeated gives different number for num_found, typically the difference between real number of keys in a bucket and num_found is about 25%
> - number of keys repaired by AAE (according to logs) is about 1-2 per few hours (number of keys "missing" in index is close to 1,000,000)
> 
> Should I now try to delete the index and yokozuna AAE data and wait another 2 weeks? If yes - how should I delete the index and AAE data? Will RpbYokozunaIndexDeleteReq be enough?
> 
> 
>  
> -- 
> Magnus Kessler
> Client Services Engineer
> Basho Technologies Limited
> 
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
> 
> 
> 
> 
> -- 
> Magnus Kessler
> Client Services Engineer
> Basho Technologies Limited
> 
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
> 
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160405/fa5b7007/attachment-0002.html>


More information about the riak-users mailing list