Resolving list conflicts with possible deletions

Paul Jones pauljones23 at gmail.com
Tue Oct 20 04:00:10 EDT 2009


Hi Bryan,

Thank you for the very detailed reply - definitely a lot of things to think
about in there.

I think in my applications instance, I'd like to avoid reverts being undone.
However, your most recent set of operations suggestion is certainly a nice
simplification on top of the various complicated scenarios I was dreaming
up, and may just do the trick.

If I cook up something sufficiently general, would there be any scope to
consider including libraries for things like this in Riak? I'd be more than
happy to contribute an initial library for doing some of these algorithms,
in  the hope that it might inspire others to add to it so there is a nice
toolbox of ways to manage conflicts.

Thanks,
Paul.

On Mon, Oct 19, 2009 at 7:34 PM, Bryan Fink <bryan at basho.com> wrote:

> On Sat, Oct 17, 2009 at 11:54 AM, Paul Jones <pauljones23 at gmail.com>
> wrote:
>
> Hi, Paul.  I've written a few different revision merge algorithms on
> Riak, and you're right: list-delete is one of the trickiest.
>
> The option that's the most outlandish, and difficult to consider is,
> can you just live with a delete being reverted?  This option makes a
> little more sense when you're thinking about a list of links to other
> objects.  Since it's possible that fetching those other links may fail
> anyway, it may be the case that you already have code in place to deal
> with broken links in a list.  Logging the inconsistency and continuing
> may be fine.
>
> If deletes need to stay deleted, then you get to play the game of
> estimating how likely it is that a delete would ever be accidentally
> reverted.
>
> For data that isn't likely to see lots of conflicting access, I've had
> success with a simple "most recent set of operations" meta-list.
> Simply a second list like, [{add, X}, {add, Y}, {del, Z}], meaning,
> "The last modification added X and Y while removing Z from this list."
>  If there are ever siblings, it's possible to "replay" the
> modifications to resolve any conflicts.
>
> For data that is somewhat more likely to exhibit conflicting access,
> but is unlikely to endure long-term, deeply-overlapping conflicts,
> I've used a tombstone-with-counter system.  Put simply, you tag
> deleted items with a tombstone plus a counter.  On every subsequent
> update, increment each of the tombstone counters.  When a tombstone
> counter reaches some threshold, prune the item from the list.  This
> system gives some guarantee that a partial log will still be around
> if, for example, there happen to be many modifications between the
> time one client retrieves an object and the time that same client
> writes its own modification.
>
> Finally, when real, long-term, unpredictable conflict has been a
> necessary problem to consider, I've looked to distributed version
> control systems for ideas.  Reverse patches, commit lattices, etc.
> It's possible to store it all for later reference.  In fact, I have
> code for such a system hiding in a fork of a very old Riak version.  I
> never released it because I didn't get a chance to put the amount of
> testing into it I wanted.  At some point I'll probably dig that out
> again, but for now I'll give you the assurance that I've seen it work.
>  ;)
>
> Something you may notice about these three options is that none of
> them fully guarantees that every conflict will be correctly-decidable.
>  Without specific shapes of lists, including domain knowledge about
> their data, I'm pretty sure it's not possible to create a 100%
> solution.  So, one of the things that I've been explicit about in each
> implementation I've done is having the code yell loudly when there's
> an undecidable conflict.  If there's any choice that would be
> completely arbitrary, yell and refuse to do it, or at least log
> somewhere what options there were and what choice was made.  That way
> corrections can be made by hand without confusion over, "How did X
> ever get there in the first place?"
>
> And, of course, likely amount of conflict isn't the only decider among
> these solutions.  Size of list, uniqueness of list entries, etc. all
> play a big factor in choosing an appropriate solution here.
>
> I've been on the hook to a few people for a longer explanation of this
> for a while now.  Perhaps sometime in the next few of weeks I'll
> document a few implementation details here and post them.
>
> -Bryan
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20091020/042c946e/attachment-0002.html>


More information about the riak-users mailing list