Riak Search and Yokozuna Backup Strategy

Sargun Dhillon sargun at sargun.me
Thu Jan 23 16:09:39 EST 2014


Not to fork the thread too far from the topic being discussed, but is
there any possibility of opening up the API used for multidatacenter
replication? Specifically, the fullsync API? I imagine the code inside
riak_repl can also be used for an external node to connect and get a
full dump of a node's content using fullsync. Incremental backups
could potentially be taken by using the AAE strategy, by the backup
sink building a merkle tree of the data it has, and using that to
generate the keylist of deltas.

Unfortunately, riak_repl is not open source, so this is something
Basho would have to build.

On Thu, Jan 23, 2014 at 12:57 PM, Dave Martorana <dave at flyclops.com> wrote:
> I like that HyperDex provides direct backup support instead of simply
> suggesting a stop-filecopy-start-catchup scenario. Are there any plans at
> Basho to make backups a core function of Riak (or as a separate but included
> utility) - it would certainly be nice to have something Basho provides help
> ensure things are done properly each time, all the time.
>
> Cheers,
>
> Dave
>
>
> On Thu, Jan 23, 2014 at 1:42 PM, Joe Caswell <jcaswell at basho.com> wrote:
>>
>> Apologies, clicked send in the middle of an incomplete thought.  It should
>> have read:
>>
>> Backing up the LevelDB data files while the node is stopped would remove
>> the necessity of using the LevelDB repair process upon restoring to make the
>> vnode self-consistent.
>>
>> From: Joe Caswell <jcaswell at basho.com>
>> Date: Thursday, January 23, 2014 1:25 PM
>> To: Sean McKibben <graphex at graphex.com>, Elias Levy
>> <fearsome.lucidity at gmail.com>
>>
>> Cc: "riak-users at lists.basho.com" <riak-users at lists.basho.com>
>> Subject: Re: Riak Search and Yokozuna Backup Strategy
>>
>> Backing up LevelDB data files can be accomplished while the node is
>> running if the sst_x directories are backed up in numerical order.  The
>> undesirable side effects of that could be duplicated data, inconsistent
>> manifest, or incomplete writes, which necessitates running the leveldb
>> repair process upon restoration for any vnode backed up while the node was
>> running.  Since the data is initially written to the recovery log before
>> being appended to level 0, and any compaction operation fully writes the
>> data to its new location before removing it from its old location, if any of
>> these operations are interrupted, the data can be completely recovered by
>> leveldb repair.
>>
>> The only incomplete write that won't be recovered by the LevelDB repair
>> process is the initial write to the recovery log, limiting exposure  to the
>> key being actively written at the time of the snapshot/backup.  As long as 2
>> vnodes in the same preflist are not backed up while simultaneously writing
>> the same key to the recovery log (i.e. rolling backups are good), this key
>> will be recovered by AAE/read repair after restoration.
>>
>> Backing up the LevelDB data files while the node is stopped would remove
>> the necessity of repairing the
>>
>> Backing up Riak Search data, on the other hand, is a dicey proposition.
>> There are 3 bits to riak search data: the document you store, the output of
>> the extractor, and the merge index.
>>
>> When you put a document in <<"key">> in a <<"bucket">> with search
>> enabled, Riak uses the pre-defined extractor to parse the document into
>> terms, possibly flattening the structure, and stores the result in
>> <<"_rsid_bucket">>/<<"key">>, which is used during update operations to
>> remove stale entries before adding new ones, and would most likely be stored
>> in a different vnode, possibly on a different node entirely.  The document
>> id/link is inserted into the merge index entry for each term identified by
>> the extractor, any or all of which may reside on different nodes.  Since the
>> document, its index document, and the term indexes could not be guaranteed
>> to be captured in any single backup operation, it is a very real probability
>> that these would be out of sync in the event that a restore is required.
>>
>> If restore is only required for a single node, consistency could be
>> restored by running a repair operation for each riak_kv vnode and
>> riak_search vnode stored on the node, which would repair the data from other
>> nodes in the cluster.  If more than one node is restored, it is quite likely
>> that they both stored replicas of the same data, for some subset of the full
>> data set.  The only way to ensure consistency is fully restored in the
>> latter case is to reindex the data set.  This can be accomplished by reading
>> and  rewriting all of the data, or by reindexing via MapReduce as suggested
>> in this earlier mailing list post:
>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-October/009861.html
>>
>> In either restore case, having a backup of the merge_index data files is
>> not helpful, so there does not appear to be any point in backing them up.
>>
>> Joe Caswell
>> From: Sean McKibben <graphex at graphex.com>
>> Date: Tuesday, January 21, 2014 1:04 PM
>> To: Elias Levy <fearsome.lucidity at gmail.com>
>> Cc: "riak-users at lists.basho.com" <riak-users at lists.basho.com>
>> Subject: Re: Riak Search and Yokozuna Backup Strategy
>>
>> +1 LevelDB backup information is important to us
>>
>>
>> On Jan 20, 2014, at 4:38 PM, Elias Levy <fearsome.lucidity at gmail.com>
>> wrote:
>>
>> Anyone from Basho care to comment?
>>
>>
>> On Thu, Jan 16, 2014 at 10:19 AM, Elias Levy <fearsome.lucidity at gmail.com>
>> wrote:
>>>
>>>
>>> Also, while LevelDB appears to be largely an append only format, the
>>> documentation currently does not recommend live backups, presumably because
>>> there are some issues that can crop up if restoring a DB that was not
>>> cleanly shutdown.
>>>
>>> I am guessing those issues are the ones documented as edge cases here:
>>> https://github.com/basho/leveldb/wiki/repair-notes
>>>
>>> That said, it looks like as of 1.4 those are largely cleared up, at least
>>> from what I gather from that page, and that one must only ensure that data
>>> is copied in a certain order and that you run the LevelDB repair algorithm
>>> when retiring the files.
>>>
>>> So is the backup documentation on LevelDB still correct?  Will Basho will
>>> enable hot backups on LevelDB backends any time soon?
>>>
>>>
>>>
>>> On Thu, Jan 16, 2014 at 10:05 AM, Elias Levy
>>> <fearsome.lucidity at gmail.com> wrote:
>>>>
>>>> How well does Riak Search play with backups?  Can you backup the Riak
>>>> Search data without bringing the node down?
>>>>
>>>> The Riak documentation backup page is completely silent on Riak Search
>>>> and its merge_index backend.
>>>>
>>>> And looking forward, what is the backup strategy for Yokozuna?  Will it
>>>> make use of Solr's Replication Handler, or something more lower level?  Will
>>>> the node need to be offline to backup it up?
>>>>
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________ riak-users mailing list
>> riak-users at lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list