Reconciling Concurrent Writes

Jason Tanner jt4websites at googlemail.com
Wed Feb 24 14:02:00 EST 2010


Justin,

One thing that occurred to me over night though is to do with the
links property in the meta data for a document.

Lets say I had a Band bucket and a Fan bucket.

I could, in the Band documents meta-data, store { links, [ {{
FanBucket, FanKey#1 }, a_fan }, ..... ] }

My understanding is that the links structure is defined.

In which case I have less scope for adding housekeeping details to
indicate how to reconcile - although I suppose I could add meta-data
of my own.

I was also wondering what limitations there are for document meta data
with respect to size ?

e.g. If a Band had a million fans, how well would riak handle having a
million item long list of links ?

Instead, in the case of potentially large relationship sets, would it
be better to have the links held in a separate bucket, as a document
per band/fan relationship ?


Jason

On 24 February 2010 00:21, Justin Sheehy <justin at basho.com> wrote:
> Hi, J T.
>
> On Tue, Feb 23, 2010 at 7:04 PM, J T <jt4websites at googlemail.com> wrote:
>
>> My question is how you might go about reconciling.
>> It could be that both versions have valid changes and the version C needs to
>> work with would in fact have both terms removed from the list.
>
> There are a few ways to deal with this kind of thing, and the best
> choice will depend on the application in question.  That is, the
> semantics of that list will determine whether the right choice is
> really to delete both items, keep both items, delete one item, or ask
> a user to choose.
>
> There are a number of "quick and dirty" things that work fine for many
> applications, especially in light of the assumption that a
> well-written application will produce very few conflicts even under
> heavy concurrent client load.  For instance, some applications might
> find accidental deletions to be very bad, but occasionally keeping
> something that was supposed to be deleted to be acceptable.  Such an
> application would just perform set union on the lists for resolution.
> Other applications just pick the last of those conflicting updates to
> have been received, and discard the others.  These approaches can
> throw away data, but are simple and are fairly close to what most
> programmers do when using an RDBMS in a typical (i.e. not everything
> is in a transaction) kind of way.
>
> To have the best possible story for resolution in such cases as you
> describe, you would want to not just store the data structure in
> question, but the operations on that data.  You would also want to
> make sure that the way you implement the operations (e.g. "delete
> item", "add item") is both commutative (can be safely re-ordered) and
> idempotent (can be safely replayed).  If you do this, then you can
> merge conflicting data simply by replaying all of the operations in
> both siblings.  The only remaining difficulty is deciding what to do
> with conflicting operations -- if A had added an item and B had
> deleted the same item, for instance.  This detail must be decided by
> you, the application author, as there is no generic right answer.  But
> for something like what you described, this wouldn't come into play as
> both "delete" operations would be processed, and C would end up with a
> correct single list of terms.
>
>> I also spotted the 'read your writes' comment in the riak paper but it
>> didn't make any sense to me.
>
> Read-your-writes consistency is just about ensuring that a single
> client has a locally consistent timeline -- that anything it reads
> reflects all of its past writes.  That notion is important, but
> doesn't help with multi-client conflict situations.
>
>> I'd appreciate some pointers on best practices for dealing with this.
>
> I hope this gets you started down a useful direction.
>
> Best,
>
> -Justin
>



More information about the riak-users mailing list