Reconciling Concurrent Writes
justin at basho.com
Tue Feb 23 19:21:54 EST 2010
Hi, J T.
On Tue, Feb 23, 2010 at 7:04 PM, J T <jt4websites at googlemail.com> wrote:
> My question is how you might go about reconciling.
> It could be that both versions have valid changes and the version C needs to
> work with would in fact have both terms removed from the list.
There are a few ways to deal with this kind of thing, and the best
choice will depend on the application in question. That is, the
semantics of that list will determine whether the right choice is
really to delete both items, keep both items, delete one item, or ask
a user to choose.
There are a number of "quick and dirty" things that work fine for many
applications, especially in light of the assumption that a
well-written application will produce very few conflicts even under
heavy concurrent client load. For instance, some applications might
find accidental deletions to be very bad, but occasionally keeping
something that was supposed to be deleted to be acceptable. Such an
application would just perform set union on the lists for resolution.
Other applications just pick the last of those conflicting updates to
have been received, and discard the others. These approaches can
throw away data, but are simple and are fairly close to what most
programmers do when using an RDBMS in a typical (i.e. not everything
is in a transaction) kind of way.
To have the best possible story for resolution in such cases as you
describe, you would want to not just store the data structure in
question, but the operations on that data. You would also want to
make sure that the way you implement the operations (e.g. "delete
item", "add item") is both commutative (can be safely re-ordered) and
idempotent (can be safely replayed). If you do this, then you can
merge conflicting data simply by replaying all of the operations in
both siblings. The only remaining difficulty is deciding what to do
with conflicting operations -- if A had added an item and B had
deleted the same item, for instance. This detail must be decided by
you, the application author, as there is no generic right answer. But
for something like what you described, this wouldn't come into play as
both "delete" operations would be processed, and C would end up with a
correct single list of terms.
> I also spotted the 'read your writes' comment in the riak paper but it
> didn't make any sense to me.
Read-your-writes consistency is just about ensuring that a single
client has a locally consistent timeline -- that anything it reads
reflects all of its past writes. That notion is important, but
doesn't help with multi-client conflict situations.
> I'd appreciate some pointers on best practices for dealing with this.
I hope this gets you started down a useful direction.
More information about the riak-users