This sure looks like a bug...?

Ben Tilly btilly at gmail.com
Tue Apr 19 02:25:30 EDT 2011


The idea of keeping change events inside of an object was one I had
not fully thought through.  I think that is doable.

I'd thought about having independently written objects for a history.
But was looking at how to make the update to the root of the history
be as close to atomic as possible.  Likewise with any conflict
resolution.

As for what semantics Riak supports, I understand what it supports.
It just seems to me like it wouldn't take a lot to make it support a
lot more.

On Mon, Apr 18, 2011 at 8:05 PM, Sean Cribbs <sean at basho.com> wrote:
> Sorry for being dismissive, I do understand what you're after. I'm just saying that if your application needs those semantics, build them in -- don't expect Riak's vector clocks to do the work for you. Keep a list of the most recent "change" events either in that object or alongside, or keep a copy of the last-seen version in your object -- whatever works to make those kinds of merges possible.
>
> Interestingly, multiple people have explored the SCM-on-top-of-Riak thing, so I know it's doable; the key difference there is that multiple, independently written objects are used to represent the history of a single conceptual "object". Once written, nothing is overwritten, only new objects are created.
>
> Sean Cribbs <sean at basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Apr 18, 2011, at 10:46 PM, Ben Tilly wrote:
>
>> I'm not missing the point you think I am.  Riak already has the
>> ability to store more than one value for a key/value pair.  I'd like
>> an option - possibly named something new, that used this to store a
>> limited amount of history so that clients could be presented with a
>> common ancestor when that was required.
>>
>> In the case that I gave you, if the common ancestor is:
>>
>>  {
>>    "name": "Jane Doe",
>>    "occupation": "secretary"
>>  }
>>
>> then a standard three-way merge would say that she got married and the
>> correct result should be:
>>
>>  {
>>    "name": "Jane Blow",
>>    "husband": "Joe Blow",
>>    "occupation": "n/a"
>>  }
>>
>> while if the common ancestor is:
>>
>>  {
>>    "name": "Jane Blow",
>>    "husband": "Joe Blow",
>>    "occupation": "n/a"
>>  }
>>
>> then a standard 3-way merge would say that she dumped the jerk and got
>> a job resulting in:
>>
>>  {
>>    "name": "Jane Doe",
>>    "occupation": "secretary"
>>  }
>>
>> Without the common ancestor you know what changed, but not which
>> direction the changes are going, and so have no sane way to resolve
>> the conflict.
>>
>> Given the non-atomic nature of reads and writes in Riak, it is likely
>> that neither of the two clients that wrote that data was in any way
>> aware of the existence of the other write.  This makes your suggestion
>> of escalating to the user impossible.  And there is no particular
>> reason to believe that the third user to come along will necessarily
>> know anything either.
>>
>> (Besides, I spent enough years maintaining batch systems to be wary of
>> escalating to users at the drop of a hat.  The "user" may well be a
>> complete moron on autopilot.)
>>
>> On Mon, Apr 18, 2011 at 7:01 PM, Sean Cribbs <sean at basho.com> wrote:
>>> I think you're missing a key point here, and that is that the vector clock doesn't store copies of the *values*, only the individual "touches" of identified clients. I'm not sure what computing the common ancestor is going to give you if you don't have the value.  Vector clocks are essentially opaque to clients.
>>>
>>> That said, I think the use-case you gave is one that can clearly bubble up to the user, e.g. "Someone else changed this record while you were editing it. Can you resolve the differences?" (Give the other person's name perhaps, highlight the fields that are different.)
>>>
>>> Sean Cribbs <sean at basho.com>
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>>
>>> On Apr 18, 2011, at 9:12 PM, Ben Tilly wrote:
>>>
>>>> Riak's small_vclock, big_vclock, young_vclock, and old_vclock
>>>> parameters already give control over pruning behavior.  If there isn't
>>>> enough history to compute a common ancestor, then return nothing for
>>>> the common ancestor.
>>>>
>>>> The use case here really isn't an SCM.  The use case is when two
>>>> clients get simultaneous (within, say, 50 ms) requests to write to the
>>>> same object.  When a third one tries to read the data 5s later, it
>>>> would be nice to have a way to figure out what to do.  For this use
>>>> case you can limit the amount of history quite severely without loss.
>>>>
>>>> Let's take a practical example of conflicting data structures:
>>>>
>>>>  {
>>>>    "name": "Jane Doe",
>>>>    "occupation": "n/a"
>>>>  },
>>>>  {
>>>>    "name": "Jane Blow",
>>>>    "husband": "Joe Blow",
>>>>    "occupation": "secretary"
>>>>  }
>>>>
>>>> What should it be resolved to?  Perhaps Jane just got divorced and
>>>> went to work as a secretary.  Or she could have gotten married and
>>>> left her job.  If you give me the common ancestor I can tell which
>>>> scenario to believe.  Without it I can only guess badly.  I don't want
>>>> to keep a history here.  I want to resolve the discrepancy the next
>>>> time I see it (and log it somewhere important if I can't resolve it).
>>>>
>>>> On Mon, Apr 18, 2011 at 5:38 PM, Sean Cribbs <sean at basho.com> wrote:
>>>>> Yes, but vector clocks are for resolution of race-conditions and network partitions, not to provide an SCM history.  Imagine how much space would be consumed by the history long enough to disambiguate an object that has been updated normally 1000 times, followed by one bad client that decides write to it without fetching the vector clock first.
>>>>>
>>>>> Coda Hale put it well in his talk at the recent Riak Meetup: your data needs to be logically monotonic so that writes (and reads) can be retried until resolution is reached.
>>>>>
>>>>> Also, we've found that assigning the client id to something that is relevant to your domain, e.g. real people, will help reduce surprises (and degenerate cases like sibling explosion) when it comes to vector-clock resolution.
>>>>>
>>>>> Sean Cribbs <sean at basho.com>
>>>>> Developer Advocate
>>>>> Basho Technologies, Inc.
>>>>> http://basho.com/
>>>>>
>>>>> On Apr 18, 2011, at 8:15 PM, Aphyr wrote:
>>>>>
>>>>>>> I actually had a question about that page.  Why is it that when there
>>>>>>> is a conflict we can only get the conflicting versions of the data?
>>>>>>> If I'm going to try to resolve the conflict intelligently, I really
>>>>>>> want the common ancestor as well so that I can try to do a 3-way
>>>>>>> merge.
>>>>>>
>>>>>> Good call. If an ancestor were available it would make counting and merging orthogonal changes *much* simpler.
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>
>>>
>
>



More information about the riak-users mailing list