Riak and transaction

Justin Sheehy justin at basho.com
Mon Feb 15 20:47:49 EST 2010


On Fri, Feb 12, 2010 at 11:36 AM, wde <wde at free.fr> wrote:

> I wonder if it would make sense to build an overlay among all replicas of an object to make transactions, to ensure atomicity of a Fun function
> which use put/get requests. It would be an additional feature, to use only if needed.

It would be possible to implement something like this, but the impact
on system behavior might be greater than is immediately obvious.
Introducing global transactions brings with it a number of very
different compromises than the ones we tend to prefer.

> I'm discovering the world of distributed key value store, I try to understand the differences between solutions based on vclock (like riak) or solutions based on paxo which ensure consistency in all cases.

This general question is directly related to your specific question
above.  The simple answer is that no distributed system can promise
such "pure" consistency while also tolerating arbitrary potential host
and network outages.

Paxos, for example, is a very useful family of protocols for consensus
with pure strict consistency and thus is useful for (e.g.) replication
of databases that have the same requirement.  However, in a
paxos-based system a node will only be able to function in the case
where it can talk to over half of the known other nodes in the system.
 Given our needs for availability even in the face of arbitrary host
or network failures, this limitation is not an option for Riak.

People quite often think that a given deployment will not have network
partitions or more than a couple of host outages at a time.  Sadly,
over time they turn out to generally be wrong.

It's worth remembering here that Riak still values consistency, just
not at the expense of everything else.

There is sometimes some confusion introduced by the term "eventual
consistency". When we use that term we don't mean that data will
become consistent at some arbitrary eventual time in the future.
Rather, Riak is built such that if the only choice is to maintain
absolute consistency or to allow a request to succeed, Riak will allow
the request to succeed.

This means that (e.g.) during the moment that a given Riak node is
unreachable by the rest of the cluster, it is possible for that node
to have some data that has not yet received an update. However, that
will only occur in such failure conditions (when a pure consistency
system would not be able to do anything at all) and will also be
automatically repaired by Riak quickly after the node becomes
reachable again.

One thing that you can be assured of: if your consistency desire is
for what is generally referred to as "read-your-writes" consistency
(that if a given party writes something, their next read will reflect
that write) then this is easy to achieve. Just use R and W values that
sum to a value greater than N; if you simply use the defaults then
this will be done for you.


More information about the riak-users mailing list