last_write_wins

John Daily jdaily at basho.com
Thu Jan 30 22:18:49 EST 2014


Replies inline.

(Thanks to Russell for the link to my blog series, but honestly, as I now re-read the section on conflict resolution, I’m unhappy with it. It’s a very confusing topic and I regret not doing a better job of clarifying it. This answer will undoubtedly also be more confusing than I’d like.)

On Jan 30, 2014, at 9:53 AM, Guido Medina <guido.medina at temetra.com> wrote:

> All of our buckets have allow_multi=false except for the one bucket we have for CRDT counters, our application requires certain some level of consistency so we have full control of our reads/writes using a fine grain locking mechanism combined with in-memory cache so in our case the LWW=true is what we would want?, now, we haven't touched this parameter so it is at its default value.

It’s a bit confusing to refer to “LWW" because the “last write wins” strategy is often referred to as LWW, and separately we have the a last_write_wins configuration parameter, and they’re not the same thing. I’m going to stick to last_write_wins to be explicit when I’m referring to the parameter, and “last write wins” when referring to the strategy. (Informally I often refer to LWW as the strategy and lww as the parameter, but I’ll spare you that casual pedantry here.)

The “last write wins” strategy comes into play whenever allow_mult is set to false, regardless of the value of last_write_wins.

Setting last_write_wins=true when allow_mult=false will optimize Bitcask[1] “put" requests to not bother reading any existing value to compare vector clocks, but if servers are offline or there are network partitions during the put operation, when read repair or active anti-entropy are invoked later the vector clock (including server timestamp) will be used to guess[2] which version of the object is the “last."


If you can truly guarantee serialization at the application layer and you can guarantee that no two updates to a single value will occur within the worst-case clock skew across your cluster, then the “last write wins” strategy is reasonable. If you have any doubt about either and data safety is important, you really should set allow_mult=true and deal with siblings.

Unfortunately, worst-case clock skew across a cluster can be pretty bad. NTP will typically keep it under control, but it’s all too easy for both NTP and your monitoring of NTP to be broken.

> 
> I'm assuming it will improve performance for our case, but, if we set LWW=true, will it affect the bucket(s) with allow_multi=true, is it safe to assume that if allow_multi=true LWW will be ignored? We only modify bucket properties using Riak Java client 1.4.x atm.

No, it’s definitely not safe. If you set your cluster default to last_write_wins=true, you should explicitly set your allow_mult=true buckets to last_write_wins=false using the Java client. As Russell indicated, the behavior if both allow_mult and last_write_wins are set to true is undefined and not guaranteed at all to be what you want, regardless of the current state of the code.

> 
> Also, about safety, LWW=true uses timestamp? and LWW=false uses vclock?, future of both?, should we leave it untouched? we don't really want to use something that could jeopardise our data consistency requirement even if it means better performance.

Vector clocks (with embedded server timestamps) are used to help Riak decide what to do about data inconsistencies regardless of the configuration settings. Riak generates (or updates) vector clocks with each put.

-John


[1] Why does last_write_wins=true only really impact Bitcask writes? If the backend supports 2i (Memory or LevelDB currently) then we have to read the old value from disk to determine whether any indexes need to be updated when that value is replaced.

[2] You could use the word “determine” here, but given the inherent unreliability of server clocks, it’s just as accurate to say “guess."


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140130/48a99750/attachment.html>


More information about the riak-users mailing list