Backing up riak

Swinney, Austin Austin at vimeo.com
Wed Apr 25 14:21:32 EDT 2012


I'd like to preface my comments by saying we love your product!  We've really got high hopes for Riak at Vimeo!

Also, my interest in backups has mainly to do with data corruption/loss from developers doing something bad.  Hence, a node loss is one scenario, but having to restore all data from backup being another scenario.



On Apr 25, 2012, at 1:31 PM, Mark Phillips wrote:

2) is it a consistent backup or is consistency old fashioned thinking?

The backup will be complete up to the point at which it was taken. You'll get a dump of all the keys in the order in which they were listed. Updates that happened during the backup may or may not be captured. (I would have to verify exactly how you would know which made it and which didn't.)


It would be nice to know for the record what the case is, so follow up is always appreciated..

With riak-admin backup, would it be possible to add a --stdout option so that the output could be pipped to gzip/bzip2?  The resulting backup file from multiple nodes could easily be larger than any one node's storage…  We have 1.8GB on five nodes that ended up being 15GB in the dump file.  It compressed down very well.  We are just using a small dataset project now, but there is talk of moving 2T of compressed mysql data over.

[root at ip-10-0-0-231 mnt]# time riak-admin backup riak at 10.0.0.231<mailto:riak at 10.0.0.231> riak /mnt/backup/test.dump all
Attempting to restart script through sudo -u riak
Backing up (all nodes) to '/mnt/backup/test.dump'.
...from ['riak at 10.0.0.231<mailto:riak at 10.0.0.231>','riak at 10.0.0.232<mailto:riak at 10.0.0.232>','riak at 10.0.0.233<mailto:riak at 10.0.0.233>',
         'riak at 10.0.0.234<mailto:riak at 10.0.0.234>','riak at 10.0.0.235<mailto:riak at 10.0.0.235>']
Backup of 'riak at 10.0.0.231<mailto:riak at 10.0.0.231>' complete
Backup of 'riak at 10.0.0.232<mailto:riak at 10.0.0.232>' complete
Backup of 'riak at 10.0.0.233<mailto:riak at 10.0.0.233>' complete
Backup of 'riak at 10.0.0.234<mailto:riak at 10.0.0.234>' complete
Backup of 'riak at 10.0.0.235<mailto:riak at 10.0.0.235>' complete
syncing and closing log

real    19m40.045s
user    6m19.976s
sys     4m12.192s

-rw-r--r-- 1 riak riak  15G Apr 24 15:56 test.dump
-rw-r--r-- 1 root root 1.9G Apr 24 16:13 test.dump.tgz




3) if you use the tar'ing up of leveldb + ring files per node, you lose one node, then you restore it from this tar file that is hours or days old, how does riak deal with bringing its data up to date?



After you restored the node, it would gradually sync its replicas with those on the other nodes via read/repair. That said, doing a complete restore of the node would probably not be needed. When the node disappears, Riak will compensate for it by sending its writes/reads to fallback nodes. When it comes back online, hinted handoff and read repair will make sure it gets all the replicas it was supposed to have and that those replicas were up to date. (You will have to force Read Repar on the replicas on that node which can be done via a list keys or using an existing snippet of code [1] for doing this but be warned that it'll put some load on that node. We're working on making the Read Repair process less reactive in future releases, but this is the best way to do it right now.) To be clear, I'm in no way advocating not backing-up your data. You just might not need to use them in this situation.


Understood.   It is also worth noting that the `riak-admin backup [all]` absorbed the meager CPU resources of my ec2 m1.large really well,  I loved the way it scaled across all 2 cores. ( http://cl.ly/2Q1i2l0n2X2o2W1J0X27  ;-)  I didn't expect that from a mysql background.  I'm so used to single threaded processes.



Another thing worth noting - the 'riak-admin backup' command is not known to be the speediest. If you have any non-trivial amount of data that needs backing up, you're probably best to do a FS snapshot of Level on each node. Unfortunately doing a live snapshot of Level is less than bulletproof at the moment, so you're advised to stop the node, snapshot level, and restart. You'll have to take the node offline for this but with five Riak nodes, your cluster should Just Keep Cranking™.



Is snapshotting known to be bullet-proof with other storage engines besides Level?  I was thinking lvm snapshots would be a decent solution when it grows larger.

That does help, thanks!

Austin

Hope that helps.

Mark

[1] Fair warning: I'm not sure the last time this was tested - http://contrib.basho.com/bucket_inspector.html

Thanks Riakers!

Austin

_______________________________________________
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120425/3197cb86/attachment.html>


More information about the riak-users mailing list