Riak Search and Yokozuna Backup Strategy

Ryan Zezeski rzezeski at basho.com
Mon Jan 27 15:29:06 EST 2014


Hi Elias,


On Mon, Jan 27, 2014 at 2:40 PM, Elias Levy <fearsome.lucidity at gmail.com>wrote:

>
>
> Any comments on the backup strategy for Yokozuna?  Will it make use of
> Solr's Replication Handler, or something more lower level?  Will the node
> need to be offline to backup it up?
>

There is no use of any Solr replication code--at all. Yokozuna (new Riak
Search, yes I know the naming is confusing) can be thought of as secondary
data to KV. It is a collection of index postings based on the canonical and
authoritative KV data. Therefore, the postings can always be rebuilt from
the KV data. AAE provides an automatic integrity check between the KV
object and its postings that is run constantly in the background.

Given that, there are two ways I see backup/restore working.

1. From a local, file-level perspective. You take a snapshot of your node's
local filesystem and use that as a save point in case of future corruption.
In this case you don't worry yourself with cluster-wide consistency, it's
just a local backup. If you ever have to restore this data then AAE and
read-repair can deal with any divergence that is caused by using the
restore. Although, you could end up with resurrected data depending on your
delete policy and age of backup. Another issue is that various parts of
Riak that write to disk may not be snapshot safe. It's already been
discussed how leveldb isn't. I'm willing to bet Lucene isn't either. Any
case where a logical operation requires multiple filesystem writes you have
to worry about the snapshot occurring in the middle of the logical
operation. I have no idea how Lucene would deal with snapshots that occur
at the wrong time. I'm unsure how good it is at detecting, and more
importantly, recovering from corruption. This is one reason why AAE is so
important. I do demos at my talks where I literally rm -rf the entire index
dir and AAE rebuilds it from scratch. This will not necessarily be a fast
operation in a real production database but it's good to know that the data
can always be re-built from the KV data. If you can cover the KV data then
you can always rebuild the indexes.

2. Backup/restore as a logical operation in Riak itself. We currently have
a backup/restore but from what I hear it has various issues and needs to be
fixed/replaced. But, assuming there was a backup command that worked I
suppose you could try playing games with Yokozuna. Perhaps Yokozuna could
freeze an index from merging segments and backup important files. Perhaps
there are replication hooks built into Solr/Lucene that could be used. I'm
not sure. I'm handwaving on purpose because I'm sure there are multiple
avenues to explore. However, another option is to punt. As I said above the
indexes can be rebuilt from the KV data. So if you have a backup that only
works for KV then the restore operation would simply re-index the data as
it is written. Yokozuna currently uses a low-level hook inside the KV vnode
that notices any time that KV data is written so it should "just work"
assuming restore goes through the KV code path and doesn't build files
directly.

-Z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140127/8dd2e0cf/attachment.html>


More information about the riak-users mailing list