Riak search, post schema change reindexation
guillaume at lighthouse-analytics.co
Mon Aug 29 11:27:47 EDT 2016
Hi Fred, thanks for your answer.
I'm using Riak 2.1 see attached status export.
I'm working on a single cluster, and need to update from time to time
the some search index on all nodes.
As a cloud user, I can consider buying a spare host for a few days in
order to achieve a complete rollout.
I can understand your plan to remove an host from production while it
reconstruct its index. From my point of view your solution can only be
applied on a broken Solr index, that needs to be rebuild from scratch on
a single host.
In my case, I need to reindex my documents because I was updated my solr
schema, which requires to wipe existing index beforehand (create new
index, change bucket index_name prop, drop old index), on all hosts
since that's a bucket type property that I need to update.
Fred, is your plan can be really applied on a « I want to update my
search schema on my full cluster » ?
At the moment, I already created the new index, destroyed the old one,
and I am unable to use a slow python script to force all items to be
written again (and subsequently pushed to solr) since I get regular
timeout on key stream API (both protobuff and http).
Is there a way to run a program inside riak nodes (not http, not
protobuf) to achieve this simple algorithm:
for key in bucket.stream_keys():
obj = bucket.get(key)
I really fear that will not be able to restore my index any time soon. I
am not stressed out because we are not in production yet, I have still
plenty of time to fix that as new data is available. But this kind of
complex operations required by index update really freak me out.
On 29/08/2016 14:41, Fred Dushin wrote:
> Hi Guillame,
> A few questions.
> What version of Riak?
> Does the reindexing need to occur across the entire cluster, or just
> on one node?
> What are the expectations about query-ability while re-indexing is
> going on?
> If you can afford to take a node out of commission for query, then one
> approach would be to delete your YZ data and YZ AAE trees, and let AAE
> sync your 30 million documents from Riak. You can increase AAE tree
> rebuild and exchange concurrency to make that occur more quickly than
> it does by default, but that will put a fairly significant load on
> that node. Moreover, because you have deleted indexed data on one
> node, you will get inconsistent search results from Yokozuna, as the
> node being reindexed will still show up as part of a coverage plan.
> Depending on the version of Riak, however, you may be able to
> manually remove that node from coverage plans through the Riak console
> while re-indexing is going on. The node is still available for Riak
> get/put operations (including indexing new entries into Solr), but it
> will be excluded from any cover set when a query plan is generated. I
> can't guarantee that this would take less than 5 days, however.
>> On Aug 29, 2016, at 3:56 AM, Guillaume Boddaert
>> <guillaume at lighthouse-analytics.co
>> <mailto:guillaume at lighthouse-analytics.co>> wrote:
>> I recently needed to alter my Riak Search schema for a bucket type
>> that contains ~30 millions rows. As a result, my index was wiped
>> since we are waiting for a Riak Search 2.2 feature that will sync
>> Riak storage with Solr index on such an occasion.
>> I adapted a since script suggested by Evren Esat Özkan there
>> (https://github.com/basho/yokozuna/issues/130#issuecomment-196189344). It
>> is a simple python script that will stream keys and trigger a store
>> action for any items. Unfortunately it failed past 178k items due to
>> time out on the key stream. I calculated that this kind of
>> reindexation mechanism would take up to 5 days without a crash to
>> I was wondering if there would be a pure Erlang mean to achieve a
>> complete forced rewrite of every single element in my bucket type
>> rather that an error prone and very long python process.
>> How would you guys reindex a 30 million item bucket type in a fast
>> and reliable way ?
>> Thanks, Guillaume
>> riak-users mailing list
>> riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
riak_auth_mods_version : <<"2.1.0-0-g31b8b30">>
erlydtl_version : <<"0.7.0">>
riak_control_version : <<"2.1.2-0-gab3f924">>
cluster_info_version : <<"2.0.3-0-g76c73fc">>
yokozuna_version : <<"2.1.2-0-g3520d11">>
ibrowse_version : <<"4.0.2">>
riak_search_version : <<"2.1.1-0-gffe2113">>
merge_index_version : <<"2.0.1-0-g0c8f77c">>
riak_kv_version : <<"2.1.2-0-gf969bba">>
riak_api_version : <<"2.1.2-0-gd8d510f">>
riak_pb_version : <<"22.214.171.124-0-g620bc70">>
protobuffs_version : <<"0.8.1p5-0-gf88fc3c">>
riak_dt_version : <<"2.1.1-0-ga2986bc">>
sidejob_version : <<"2.0.0-0-gc5aabba">>
riak_pipe_version : <<"2.1.1-0-gb1ac2cf">>
riak_core_version : <<"2.1.5-0-gb02ab53">>
exometer_core_version : <<"1.0.0-basho2-0-gb47a5d6">>
poolboy_version : <<"0.8.1p3-0-g8bb45fb">>
pbkdf2_version : <<"2.0.0-0-g7076584">>
eleveldb_version : <<"2.0.17-0-g973fc92">>
clique_version : <<"0.3.2-0-ge332c8f">>
bitcask_version : <<"1.7.2">>
basho_stats_version : <<"1.0.3">>
webmachine_version : <<"1.10.8-0-g7677c24">>
mochiweb_version : <<"2.9.0">>
inets_version : <<"5.9.6">>
xmerl_version : <<"1.3.4">>
erlang_js_version : <<"1.3.0-0-g07467d8">>
runtime_tools_version : <<"1.8.12">>
os_mon_version : <<"2.2.13">>
riak_sysmon_version : <<"2.0.0">>
ssl_version : <<"5.3.1">>
public_key_version : <<"0.20">>
crypto_version : <<"3.1">>
asn1_version : <<"2.0.3">>
sasl_version : <<"2.3.3">>
lager_version : <<"2.1.1">>
goldrush_version : <<"0.1.7">>
compiler_version : <<"4.9.3">>
syntax_tools_version : <<"1.6.11">>
stdlib_version : <<"1.19.3">>
kernel_version : <<"2.16.3">>
More information about the riak-users