Reconciling Concurrent Writes

Jason Tanner jt4websites at googlemail.com
Tue Feb 23 21:38:18 EST 2010


Hi Justin,

Thanks for the info.

One approach I was considering, since I may need to cache things
anyway, is to implement a distributed caching layer, whereby each
object retrieved from a bucket is cached but as a process with a TTL.
Any operations occur via an API on the cached process that corresponds
with that bucket/key.

I think, this way I could if i wanted to, introduce transaction
semantics at least on a per document basis.

I'm not sure yet though what the impact of that might be.

Jason

On 24 February 2010 00:21, Justin Sheehy <justin at basho.com> wrote:
> Hi, J T.
>
> On Tue, Feb 23, 2010 at 7:04 PM, J T <jt4websites at googlemail.com> wrote:
>
>> My question is how you might go about reconciling.
>> It could be that both versions have valid changes and the version C needs to
>> work with would in fact have both terms removed from the list.
>
> There are a few ways to deal with this kind of thing, and the best
> choice will depend on the application in question.  That is, the
> semantics of that list will determine whether the right choice is
> really to delete both items, keep both items, delete one item, or ask
> a user to choose.
>
> There are a number of "quick and dirty" things that work fine for many
> applications, especially in light of the assumption that a
> well-written application will produce very few conflicts even under
> heavy concurrent client load.  For instance, some applications might
> find accidental deletions to be very bad, but occasionally keeping
> something that was supposed to be deleted to be acceptable.  Such an
> application would just perform set union on the lists for resolution.
> Other applications just pick the last of those conflicting updates to
> have been received, and discard the others.  These approaches can
> throw away data, but are simple and are fairly close to what most
> programmers do when using an RDBMS in a typical (i.e. not everything
> is in a transaction) kind of way.
>
> To have the best possible story for resolution in such cases as you
> describe, you would want to not just store the data structure in
> question, but the operations on that data.  You would also want to
> make sure that the way you implement the operations (e.g. "delete
> item", "add item") is both commutative (can be safely re-ordered) and
> idempotent (can be safely replayed).  If you do this, then you can
> merge conflicting data simply by replaying all of the operations in
> both siblings.  The only remaining difficulty is deciding what to do
> with conflicting operations -- if A had added an item and B had
> deleted the same item, for instance.  This detail must be decided by
> you, the application author, as there is no generic right answer.  But
> for something like what you described, this wouldn't come into play as
> both "delete" operations would be processed, and C would end up with a
> correct single list of terms.
>
>> I also spotted the 'read your writes' comment in the riak paper but it
>> didn't make any sense to me.
>
> Read-your-writes consistency is just about ensuring that a single
> client has a locally consistent timeline -- that anything it reads
> reflects all of its past writes.  That notion is important, but
> doesn't help with multi-client conflict situations.
>
>> I'd appreciate some pointers on best practices for dealing with this.
>
> I hope this gets you started down a useful direction.
>
> Best,
>
> -Justin
>




More information about the riak-users mailing list