1.4.2: 'riak-admin reip' no longer works?
Rune Skou Larsen
rsl at trifork.com
Tue Oct 22 07:09:20 EDT 2013
Den 22-10-2013 09:53, Shane McEwan skrev:
> On 21/10/13 17:57, Joe Caswell wrote:
>> The only use case left for reip is when you have simultaneously changed the
>> node name for every node in the cluster, such as when loading an entire
>> cluster's worth of backups to new machines.
> When I need to do this I just create a new, empty cluster with the new
> names. Then shut down the cluster and restore only the data directories
> (leveldb, for example) from the backup, leaving the ring directory
> alone. Then I start up the cluster and it finds the restored data.
Thanks for the tip. Beware that this will not restore bucket props,
because they are stored in the ring dir and not the data dir.
> You need to be careful about restoring the old node's data to the
> corresponding new node otherwise you'll get hinted handoffs flying
> between all your nodes but after a bit of trial and error you can figure
> out which node is which.
When you create the new, empty cluster, Riak distributes the partitions
between the nodes using the claim function. I believe claim_v2
(riak_core_claim.erl) is still the default claim function and it will
produce different partition distributions in different runs when joining
nodes to form a cluster. Including the dreaded 12,12,12,12,16 on a
default config with 5 nodes.
I guess sometimes you'll be lucky, and the new shiny cluster will have
the same partition distribution as the backup, but in my experience,
this is not always the case, which means the new cluster will need to
handoff data between nodes, to match the data with the ring-files'
The good(tm) way to restore a complete backup to a new environment is to
restore data and partition distribution together - i.e. both the data
dirs and the ring files.
For this purpose, reip was very useful in v 1.3.x, where the node it was
called on, did not have to be running. Unfortunately in 1.4.2, the
reip'ed node must be running (which sort of defies the purpose of reip):
"riak-1.3.2/rel/riak/bin> ./riak-admin reip bla bla2
Backed up existing ring file to
New ring file written to "./data/ring/riak_core_ring.default.20131022105325
riak-1.4.2/rel/riak/bin> ./riak-admin reip bla bla2
Node is not running!"
A common case where you need reip'ing non-running nodes, is when you
copy production data to a staging environment, and need to ensure that
your new staging cluster doesn't reference production nodes before
firing it up. Does anyone know a good solution to this in 1.4? The only
two ways I see are: 1) Edit the ring-files by hand 2) Restore to a new
cluster with potentially mismatched partition distribution between data
and ring-files, and wait for handoffs to complete.
- Rune, Trifork
More information about the riak-users