Riak Adoption - What can we do better?
aphyr at aphyr.com
Sat Apr 21 12:28:57 EDT 2012
On 04/21/2012 09:07 AM, Les Mikesell wrote:
> On Fri, Apr 20, 2012 at 5:00 PM, Kyle Kingsbury<aphyr at aphyr.com> wrote:
>> OK, so how about Statebox? We use timestamps to ameliorate the GC problem so
>> long as a given time window. Our hosts are running NTP so it's all cool, ya?
>> Wrong. One of your hosts is not running NTP. Clock desync issues are fucking
>> *ubiquitous*, sadly, and you have to be willing to accept, say, losing all
>> conflicting writes from a client under some clock skew circumstances.
> How hard is it for a cluster-aware application to tell that the clocks
> are out of sync? You probably can't do better than NTP at fixing it,
> but why even continue to run in that state? If all it takes is a
> good clock for reliability, let's build good clocks.
You are a network packet. It is very dark. You are likely to be eaten by
Joking aside, many applications do try to do broken clock detection, but
correcting the error automatically depends on application semantics. On
top of that, it can be impossible to detect in partitioned situations.
There are also cases where you *want* to be available in cases where
time sync is impossible; consider, for example, mobile clients which may
make requests long after the user interaction. A *logical* consistency
is paramount... but I digress, haha.
It is *OK* to accept the clock issue sometimes, especially with the
understanding that it provides probabilistic constraints on conflict
resolution. Being right 80% of the time can be better than 50% of the
time. But you *have* to be willing to understand and accept the risks;
it's something most people have no idea exists.
That's what I think Riak adoption is missing; a huge portion of devs
just don't understand the implications of Riak's HA approach: logical
clocks, causality bounds, and synchronization boundaries. I'm hoping to
change some of that with Meangirls, but I'm not convinced it's enough.
Reid Draper told me a little while ago that he wanted devs to extend
CRDTs and logical clocks all the way into their APIs and mobile clients,
and I think he might be right.
More information about the riak-users