Yokozuna inconsistent search results

Magnus Kessler mkessler at basho.com
Thu Mar 24 15:11:15 EDT 2016


Hi Oleksiy,

On 24 March 2016 at 14:55, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:

> Hi Magnus,
>
> Thanks! I guess I will go with index deletion because I've already tried
> expiring the trees before.
>
> Do I need to delete AAE data somehow or removing the index is enough?
>

If you expire the AAE trees with the commands I posted earlier, there
should be no need to remove the AAE data directories manually.

I hope this works for you. Please monitor the tree rebuild and exchanges
with `riak-admin search aae-status` for the next few days. In particular
the exchanges should be ongoing on a continuous basis once all trees have
been rebuilt. If they don't, please let me know. At that point you should
also gather `riak-debug` output from all nodes before it gets rotated out
after 5 days by default.

Kind Regards,

Magnus


>
> On 24 March 2016 at 13:28, Magnus Kessler <mkessler at basho.com> wrote:
>
>> Hi Oleksiy,
>>
>> As a first step, I suggest to simply expire the Yokozuna AAE trees again
>> if the output of `riak-admin search aae-status` still suggests that no
>> recent exchanges have taken place. To do this, run `riak attach` on one
>> node and then
>>
>> riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], 5000).
>>
>>
>> Exit from the riak console with `Ctrl+G q`.
>>
>> Depending on your settings and amount of data the full index should be
>> rebuilt within the next 2.5 days (for a cluster with ring size 128 and
>> default settings). You can monitor the progress with `riak-admin search
>> aae-status` and also in the logs, which should have messages along the
>> lines of
>>
>> 2016-03-24 10:28:25.372 [info]
>> <0.4647.6477>@yz_exchange_fsm:key_exchange:179 Repaired 83055 keys during
>> active anti-entropy exchange of partition
>> 1210306043414653979137426502093171875652569137152 for preflist
>> {1164634117248063262943561351070788031288321245184,3}
>>
>>
>> Re-indexing can put additional strain on the cluster and may cause
>> elevated latency on a cluster already under heavy load. Please monitor the
>> response times while the cluster is re-indexing data.
>>
>> If the cluster load allows it, you can force more rapid re-indexing by
>> changing a few parameters. Again at the `riak attach` console, run
>>
>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_build_limit, {4, 60000}], 5000).
>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_concurrency, 5], 5000).
>>
>> This will allow up to 4 trees per node to be built/exchanged per hour,
>> with up to 5 concurrent exchanges throughout the cluster. To return back to
>> the default settings, use
>>
>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_build_limit, {1, 360000}], 5000).
>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, anti_entropy_concurrency, 2], 5000).
>>
>>
>> If the cluster still doesn't make any progress with automatically
>> re-indexing data, the next steps are pretty much what you already
>> suggested, to drop the existing index and re-index from scratch. I'm
>> assuming that losing the indexes temporarily is acceptable to you at this
>> point.
>>
>> Using any client API that supports RpbYokozunaIndexDeleteReq, you can
>> drop the index from all Solr instances, losing any data stored there
>> immediately. Next, you'll have to re-create the index. I have tried this
>> with the python API, where I deleted the index and re-created it with the
>> same already uploaded schema:
>>
>> from riak import RiakClient
>>
>> c = RiakClient()
>> c.delete_search_index('my_index')
>> c.create_search_index('my_index', 'my_schema')
>>
>> Note that simply deleting the index does not remove it's existing
>> association with any bucket or bucket type. Any PUT operations on these
>> buckets will lead to indexing failures being logged until the index has
>> been recreated. However, this also means that no separate operation in
>> `riak-admin` is required to associate the newly recreated index with the
>> buckets again.
>>
>> After recreating the index expire the trees as explained previously.
>>
>> Let us know if this solves your issue.
>>
>> Kind Regards,
>>
>> Magnus
>>
>>
>> On 24 March 2016 at 08:44, Oleksiy Krivoshey <oleksiyk at gmail.com> wrote:
>>
>>> This is how things are looking after two weeks:
>>>
>>> - there are no solr indexing issues for a long period (2 weeks)
>>> - there are no yokozuna errors at all for 2 weeks
>>> - there is an index with all empty schema, just _yz_* fields, objects
>>> stored in a bucket(s) are binary and so are not analysed by yokozuna
>>> - same yokozuna query repeated gives different number for num_found,
>>> typically the difference between real number of keys in a bucket and
>>> num_found is about 25%
>>> - number of keys repaired by AAE (according to logs) is about 1-2 per
>>> few hours (number of keys "missing" in index is close to 1,000,000)
>>>
>>> Should I now try to delete the index and yokozuna AAE data and wait
>>> another 2 weeks? If yes - how should I delete the index and AAE data?
>>> Will RpbYokozunaIndexDeleteReq be enough?
>>>
>>>
>>>
>> --
>> Magnus Kessler
>> Client Services Engineer
>> Basho Technologies Limited
>>
>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
>>
>
>


-- 
Magnus Kessler
Client Services Engineer
Basho Technologies Limited

Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160324/8f1e84b3/attachment-0002.html>


More information about the riak-users mailing list