standby cluster experiment

Jeremy Raymond jeraymond at gmail.com
Mon Dec 19 15:04:47 EST 2011


Hi John,

I'm curious if you ever figured out what was going on?

--
Jeremy


On Fri, Dec 9, 2011 at 2:53 PM, John Loehrer <jloehrer at gaiaonline.com>wrote:

> I am currently evaluating riak. I'd like to be able to do periodic
> snapshots of /var/lib/riak using LVM without stopping the node. According
> to a response on this ML you should be able to copy the data directory for
> eleveldb backend.
>
> http://comments.gmane.org/gmane.comp.db.riak.user/5202
>
>
> If I cycle through each node and do `riak stop` before taking a snapshot
> everything works fine.
> But if I don't shut down the node before copying, I run into problems.
> Since I access the http interface of the cluster through an haproxy
> load-balancer, once the node turns off it is taken out of the pool almost
> immediately. But for a millisecond or two before haproxy detects the node
> is down there might be some bad responses. I can live with it and build
> better retries into my client, but would rather avoid it if I can.
>
> More details below.
>
> Thanks for help!
>
> ~ John Loehrer
> Gaia Interactive INC
>
>
> DETAILS
> ------------------
> I am playing with the idea of being able to bring up an standby cluster on
> an alternate port on the same server pointing at an hourly snapshot of my
> choosing, so that I can go back in time and review the data for recovery
> and repair purposes.
>
> Here's what I have so far.
>
> I have a small cluster of 4 nodes on centos 5.4 using the eleveldb backend
> so I can take advantage of 2i (very cool feature, btw).
>
> Steps for installation:
>
>
> ----
> # install the riak rpm ...
> yum install riak-1.0.2-1.el5.x86_64.rpm
>
> # get the ip address out of ifconfig
> IPADDR=`ifconfig eth0 | grep "inet addr" | awk '{ print $2 }' | awk 'BEGIN
> { FS=":" } { print $2 }'`
>
> # replace the loopback ip address in app.config and vm.args with the
> machine's ip
> perl -pi -e s/127.0.0.1/$IPADDR/g /etc/riak/*
>
> # change the storage backend to eleveldb
> perl -pi -e 's/riak_kv_bitcask_backend/riak_kv_eleveldb_backend/g'
> /etc/riak/app.config
> ----
>
> We also mount an lvm partition at /var/lib/riak so we can snapshot the
> data directory and back it up using rsnapshot once per hour. It uses
> hardlinks on all the files from the initial snapshot of the data making for
> very efficient storage. The append-only storage approach of the leveldb and
> bitcask backends mean that once a file is closed it is immutable. rsnapshot
> only has to rsync over files that have changed since the previous snapshot.
> Hourly snapshots of take up only a little bit more storage space as the
> original even if i populate the cluster with hundreds of millions of keys
> over the course of a 24 hour period. The backup operation takes only a few
> seconds even for 50GB of data. Now i can copy the data in the hourly
> snapshot directory to my standby riak node, reip, and start up a standby
> cluster on the same machines. pointing to an hourly snapshot and starting
> up the node takes only a second or two as well.
>
> Steps for creating the standby node on the same machine:
>
> ----
>
> # make the root directory of the standby node in the snapshots directory
> # so that we can hard-link to the hourly snapshots dir for a which restore.
> mkdir /.snapshots/riak-standby
>
> # create a handy symlink for the standby node root dir ...
> # we'll use /riak-standby from now on.
> ln -s /.snapshots/riak-standby /riak-standby
>
> # create the default directory structure
> mkdir -p /riak-standby/bin/
> mkdir -p /riak-standby/etc/
> mkdir -p /riak-standby/data
>
> # we are going to use the same libraries, so symlink that in place.
> ln -s /usr/lib64/riak/* /riak-standby/
>
> # copy the app.config and vm.args files from the live node
> cp ~/etc/riak/app.config /riak-standby/etc/app.config
> cp ~/etc/riak/vm.args /riak-standby/etc/vm.args
>
> # now, we need to make the app.config file work for the standby node.
> # change /var/lib/riak to ./data
> perl -pi -e 's/\/var\/lib\/riak/.\/data/g' /riak-standby/etc/app.config
>
> # change /usr/sbin to ./bin
> perl -pi -e 's/\/usr\/sbin/.\/bin/g' /riak-standby/etc/app.config
>
> # change /usr/lib64/riak to ./lib
> perl -pi -e 's/\/usr\/lib64\/riak/.\/lib/g' /riak-standby/etc/app.config
>
> # change /var/log/riak to ./log
> perl -pi -e 's/\/var\/log\/riak/.\/log/g' /riak-standby/etc/app.config
>
> # change all the ports from 80** to 81**
> perl -pi -e 's/80/81/g' /riak-standby/etc/app.config
>
> # change the cookie and node names in vm.args
> perl -pi -e 's/riak@/stby@/g' /riak-standby/etc/app.config
> perl -pi -e 's/setcookie riak/setcookie stby/g'
> /riak-standby/etc/app.config
>
> # fix any permission issues.
> chown -R riak:riak /.snapshots/riak-standby
>
> The riak script in /riak-standby/bin/riak is almost the same as the
> default one installed in /usr/sbin/riak:
>
> diff /usr/sbin/riak /riak-standby/bin/riak
> 3a4
> > ## MANAGED BY PUPPET.
> 5c6
> < RUNNER_SCRIPT_DIR=/usr/sbin
> ---
> > RUNNER_SCRIPT_DIR=$(cd ${0%/*} && pwd)
> 8,11c9,12
> < RUNNER_BASE_DIR=/usr/lib64/riak
> < RUNNER_ETC_DIR=/etc/riak
> < RUNNER_LOG_DIR=/var/log/riak
> < PIPE_DIR=/var/run/riak/
> ---
> > RUNNER_BASE_DIR=${RUNNER_SCRIPT_DIR%/*}
> > RUNNER_ETC_DIR=$RUNNER_BASE_DIR/etc
> > RUNNER_LOG_DIR=$RUNNER_BASE_DIR/log
> > PIPE_DIR=/tmp/$RUNNER_BASE_DIR/
> 13c14
> < PLATFORM_DATA_DIR=/var/lib/riak
> ---
> > PLATFORM_DATA_DIR=./data
>
>
> Same is true of the riak-admin script for the standby node:
>
> diff /usr/sbin/riak-admin /riak-standby/bin/riak-admin
> 1a2
> > ## MANAGED BY PUPPET.
> 3c4
> < RUNNER_SCRIPT_DIR=/usr/sbin
> ---
> > RUNNER_SCRIPT_DIR=$(cd ${0%/*} && pwd)
> 6,8c7,9
> < RUNNER_BASE_DIR=/usr/lib64/riak
> < RUNNER_ETC_DIR=/etc/riak
> < RUNNER_LOG_DIR=/var/log/riak
> ---
> > RUNNER_BASE_DIR=${RUNNER_SCRIPT_DIR%/*}
> > RUNNER_ETC_DIR=$RUNNER_BASE_DIR/etc
> > RUNNER_LOG_DIR=$RUNNER_BASE_DIR/log
>
>
> After that, i expected to be able to just copy the data from snapshots,
> reip, and start up my standby cluster.
>
>
> rm -rf /riak-standby/data && cp -al /.snapshots/hourly.0/riak/
> /riak-standby/data
> /riak-standby/bin/riak-admin reip riak@<ip1> stby@<ip1>
> /riak-standby/bin/riak-admin reip riak@<ip2> stby@<ip2>
> /riak-standby/bin/riak-admin reip riak@<ip3> stby@<ip3>
> /riak-standby/bin/riak-admin reip riak@<ip4> stby@<ip4>
> /riak-standby/bin/riak start
>
>
> But when I did, /riak-standby/bin/riak-admin ring_status showed the
> claimant as riak@<ip1>  not stby@<ip1> as I expected.
>
> Instead of doing reip, i did a binary safe replacement of riak@  with
> stby@:
>
> perl -pi -e 's/riak@/stby@/g'
> /riak-standby/data/ring/riak_core_ring.default.*
>
> When the nodes start up, the claimant looks correct and all the nodes join
> together just fine.
>
> But I still have the problem where the data directory fills up even though
> nothing is being written actively to the standby cluster. I left it alone
> for 5 or 6 hours and it eventually filled up an entire TB of storage.
>
> I noticed that riak-admin transfers starts off showing waiting to hand off
> 1 partition.
>
>  /riak-standby/bin/riak-admin transfers
> 'stby@<ip1>' waiting to handoff 1 partitions
>
> This usually clears up after a minute or so. Not sure it if is related.
>
>
> No clues in the console log. They all look something like:
>
> 2011-12-09 19:10:33.371 [info] <0.7.0> Application bitcask started on node
> 'stby at 192.168.3.94'
> 2011-12-09 19:10:33.388 [info] <0.7.0> Application riak_kv started on node
> 'stby at 192.168.3.94'
> 2011-12-09 19:10:33.388 [info] <0.7.0> Application skerl started on node '
> stby at 192.168.3.94'
> 2011-12-09 19:10:33.391 [info] <0.7.0> Application luwak started on node '
> stby at 192.168.3.94'
> 2011-12-09 19:10:33.402 [info] <0.7.0> Application merge_index started on
> node 'stby at 192.168.3.94'
> 2011-12-09 19:10:33.405 [info] <0.7.0> Application riak_search started on
> node 'stby at 192.168.3.94'
> 2011-12-09 19:10:33.405 [info] <0.7.0> Application basho_stats started on
> node 'stby at 192.168.3.94'
> 2011-12-09 19:10:33.419 [info] <0.7.0> Application runtime_tools started
> on node 'stby at 192.168.3.94'
> 2011-12-09 19:10:33.419 [info] <0.7.0> Application public_key started on
> node 'stby at 192.168.3.94'
> 2011-12-09 19:10:33.447 [info] <0.7.0> Application ssl started on node '
> stby at 192.168.3.94'
>
>
>
> If I turn off the node before taking the snapshot everything works fine.
>
> /etc/init.d/riak stop
>  .... do backup here
> /etc/init.d/riak start
>
>
> But the standby data directory starts filling up at the rate of about 500
> MB a second on some of the nodes if I do a copy without first stopping
> riak. I know this is not a supported approach, but I was curious if someone
> might be able to shed some light on what might be happening.
>
>
> Ideas?
>
>
> Thanks for any insight.
>
>
>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111219/76f9b997/attachment-0001.html>


More information about the riak-users mailing list