Riak Adoption - What can we do better?

Kyle Kingsbury aphyr at aphyr.com
Fri Apr 20 18:00:04 EDT 2012

On 04/20/2012 02:47 PM, Paul Gross wrote:
> Conceptually, a bucket is a list of documents. So you could handle lists
> and sets in a similar fashion during a network partition. Each side
> would see its own list, be able to add and remove. When the network was
> restored, it could merge the lists (with deletes). There are definitely
> intricacies to work through, but I think it's possible.

Any deletes issued during this time would be lost.

Case 1: Duplicate delivery/deletion on both sides of the partition.
Case 2: Deletion on one side only, restored by read repair after 
partition resolved, delivered *again*.

It gets weirder when you start considering deleted versioned tombstones.

This is the classic shared-nothing queue conundrum: you can guarantee 
at-most-once delivery, or at-least-once-delivery, but not both.

OK, you say: so we apply CRDTs, and use an observed-removed set across a 
bucket. That's doable, right? Both sides flag a given value as removed, 
and reconcile cleanly. Well yes, but now you require unbounded space, 
and surprise, we have a very disappointing proof that garbage-collecting 
this class of CRDT requires total coordination among participants. Well 

OK, so how about Statebox? We use timestamps to ameliorate the GC 
problem so long as a given time window. Our hosts are running NTP so 
it's all cool, ya? Wrong. One of your hosts is not running NTP. Clock 
desync issues are fucking *ubiquitous*, sadly, and you have to be 
willing to accept, say, losing all conflicting writes from a client 
under some clock skew circumstances. Since you're talking about a queue, 
conflicts are *almost always guaranteed*.

Seriously, if you or anyone has a good distributed HA queue, please let 
me know. Better yet, write a paper, found a startup, and become insanely 
fucking rich. Everyone needs this, but it's a very hard problem. I mean, 
+1 for Basho solving it and all, but it's not exactly trivial.

>>> MongoDB also has capped collections which keep a fixed number of
>>> documents.
>> Strictly speaking, they don't.
> Can you please elaborate?

Capped collections will delete records when they are out of space. Don't 
get me started on Mongo's behavior around replicas during partitions.


More information about the riak-users mailing list