Possibility of a CAS API
reiddraper at gmail.com
Fri Feb 24 22:46:33 EST 2012
On Feb 24, 2012, at 10:21 PM, Armon Dadgar wrote:
> It sounds like the "If-Unmodified-Since" and "If-None-Match" flags could do what I
> need, but the docs specify "it is possible for the condition to evaluate to true for
> multiple requests if the requests occur at the same time."
> From my understanding, the KV vnode's process their requests in a serial fashion.
> I'm not sure I fully understand how It could be that the request evaluates to true
> for multiple requests, if the PUTs are handled serially.
> If it is a matter of the vnodes being interleaved, would it be solvable by
> setting w = r = n?
The problem is that amongst all replicas for a particular key,
operations are not serializable. Put another way, if there
are concurrent writes to three replicas,
there is no way to figure out a total ordering for the actions.
It's also important to note that even when you set W=N
for a write, it's possible that 1 write could succeed
and 2 could fail. The succeeding write is _not_ "rolled back"
when this happens. The user will see an error message
that the write didn't succeed on all replicas.
> I'm not convinced that a CAS operation is inevitably subject to data races.
> There are proven techniques for avoiding races at the cost of latency,
> which is acceptable in certain situations.
Correct, but as far as I know, there is no way to build a CAS system
on top of the primitives provided by the Riak public API. You need
a point of serialization amongst all of the replicas (for a particular key),
which Riak does not provide, for availability reasons.
> I will take a look at Zab, thanks for the reference!
Zab and Paxos are going to be your best references.
It's also worth noting that if you don't need high availability,
there are other ways of gaining durability that will give
you strong consistency and the ability to do CAS operations.
> Best Regards,
> Armon Dadgar
> On Feb 24, 2012, at 6:09 PM, Dietrich Featherston wrote:
>> If you need CAS semantics, then coordinate that outside of riak. Any check-then-act type of operation where atomicity is important is going to leave some room for a data race in a system with the distribution semantics of riak. Would suggest thinking about the problem in such a way that handling of siblings is tolerant of duplicate writes and eventually the correct value bubbles up to the readers. That or do the coordination of unique indexes in something not dynamo shaped.
>> I can't say I'm intimately familiar with the work yet, but others have prototyped/postulated consistency layers on top of riak (a la zab) that might more closely match what you're trying to do. None of this is in a released / supported version of riak to my knowledge though.
>> On Fri, Feb 24, 2012 at 4:41 PM, Armon Dadgar <armon.dadgar at gmail.com> wrote:
>> As part of a new feature we are working on, we've run into
>> a situation where it would be incredibly convenient to have a
>> check-and-set (CAS) API for Riak KV. In short, we are trying to build
>> a unique index of a bucket, using a second bucket which acts as a
>> reverse index.
>> The CAS API would operate in the same manner as a PUT, except it
>> should take a "last vclock". The new value + last vclock are submitted
>> to the responsible vnodes. The vnodes respond if the last vclock
>> for the key matches the specified last value. If we get "r" nodes responding
>> that the last value matches, then we should commit the write. This method
>> is basically a two-phase commit.
>> It would also be great if no-value sentinel could be specified to indicate
>> the CAS should only succeed if there is not already a key. We need this
>> to make sure uniqueness constraints are not violated.
>> I wanted to gauge the interest from the community in something like this,
>> and see if I could get thoughts from the Basho team on if this could be
>> Best Regards,
>> Armon Dadgar
>> riak-users mailing list
>> riak-users at lists.basho.com
> riak-users mailing list
> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users