This sure looks like a bug...?

Sean Cribbs sean at basho.com
Mon Apr 18 22:01:15 EDT 2011


I think you're missing a key point here, and that is that the vector clock doesn't store copies of the *values*, only the individual "touches" of identified clients. I'm not sure what computing the common ancestor is going to give you if you don't have the value.  Vector clocks are essentially opaque to clients.

That said, I think the use-case you gave is one that can clearly bubble up to the user, e.g. "Someone else changed this record while you were editing it. Can you resolve the differences?" (Give the other person's name perhaps, highlight the fields that are different.)

Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Apr 18, 2011, at 9:12 PM, Ben Tilly wrote:

> Riak's small_vclock, big_vclock, young_vclock, and old_vclock
> parameters already give control over pruning behavior.  If there isn't
> enough history to compute a common ancestor, then return nothing for
> the common ancestor.
> 
> The use case here really isn't an SCM.  The use case is when two
> clients get simultaneous (within, say, 50 ms) requests to write to the
> same object.  When a third one tries to read the data 5s later, it
> would be nice to have a way to figure out what to do.  For this use
> case you can limit the amount of history quite severely without loss.
> 
> Let's take a practical example of conflicting data structures:
> 
>  {
>    "name": "Jane Doe",
>    "occupation": "n/a"
>  },
>  {
>    "name": "Jane Blow",
>    "husband": "Joe Blow",
>    "occupation": "secretary"
>  }
> 
> What should it be resolved to?  Perhaps Jane just got divorced and
> went to work as a secretary.  Or she could have gotten married and
> left her job.  If you give me the common ancestor I can tell which
> scenario to believe.  Without it I can only guess badly.  I don't want
> to keep a history here.  I want to resolve the discrepancy the next
> time I see it (and log it somewhere important if I can't resolve it).
> 
> On Mon, Apr 18, 2011 at 5:38 PM, Sean Cribbs <sean at basho.com> wrote:
>> Yes, but vector clocks are for resolution of race-conditions and network partitions, not to provide an SCM history.  Imagine how much space would be consumed by the history long enough to disambiguate an object that has been updated normally 1000 times, followed by one bad client that decides write to it without fetching the vector clock first.
>> 
>> Coda Hale put it well in his talk at the recent Riak Meetup: your data needs to be logically monotonic so that writes (and reads) can be retried until resolution is reached.
>> 
>> Also, we've found that assigning the client id to something that is relevant to your domain, e.g. real people, will help reduce surprises (and degenerate cases like sibling explosion) when it comes to vector-clock resolution.
>> 
>> Sean Cribbs <sean at basho.com>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
>> On Apr 18, 2011, at 8:15 PM, Aphyr wrote:
>> 
>>>> I actually had a question about that page.  Why is it that when there
>>>> is a conflict we can only get the conflicting versions of the data?
>>>> If I'm going to try to resolve the conflict intelligently, I really
>>>> want the common ancestor as well so that I can try to do a 3-way
>>>> merge.
>>> 
>>> Good call. If an ancestor were available it would make counting and merging orthogonal changes *much* simpler.
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 





More information about the riak-users mailing list