Riak Adoption - What can we do better?

Kyle Kingsbury aphyr at aphyr.com
Fri Apr 20 18:00:04 EDT 2012


On 04/20/2012 02:47 PM, Paul Gross wrote:
> Conceptually, a bucket is a list of documents. So you could handle lists
> and sets in a similar fashion during a network partition. Each side
> would see its own list, be able to add and remove. When the network was
> restored, it could merge the lists (with deletes). There are definitely
> intricacies to work through, but I think it's possible.

Any deletes issued during this time would be lost.

Case 1: Duplicate delivery/deletion on both sides of the partition.
Case 2: Deletion on one side only, restored by read repair after 
partition resolved, delivered *again*.

It gets weirder when you start considering deleted versioned tombstones.

This is the classic shared-nothing queue conundrum: you can guarantee 
at-most-once delivery, or at-least-once-delivery, but not both.

OK, you say: so we apply CRDTs, and use an observed-removed set across a 
bucket. That's doable, right? Both sides flag a given value as removed, 
and reconcile cleanly. Well yes, but now you require unbounded space, 
and surprise, we have a very disappointing proof that garbage-collecting 
this class of CRDT requires total coordination among participants. Well 
shit.

OK, so how about Statebox? We use timestamps to ameliorate the GC 
problem so long as a given time window. Our hosts are running NTP so 
it's all cool, ya? Wrong. One of your hosts is not running NTP. Clock 
desync issues are fucking *ubiquitous*, sadly, and you have to be 
willing to accept, say, losing all conflicting writes from a client 
under some clock skew circumstances. Since you're talking about a queue, 
conflicts are *almost always guaranteed*.

Seriously, if you or anyone has a good distributed HA queue, please let 
me know. Better yet, write a paper, found a startup, and become insanely 
fucking rich. Everyone needs this, but it's a very hard problem. I mean, 
+1 for Basho solving it and all, but it's not exactly trivial.

>>> MongoDB also has capped collections which keep a fixed number of
>>> documents.
>>
>> Strictly speaking, they don't.
>>
> Can you please elaborate?

Capped collections will delete records when they are out of space. Don't 
get me started on Mongo's behavior around replicas during partitions.

--Kyle




More information about the riak-users mailing list