Riak Adoption - What can we do better?

Zheng Zhibin witeman.g at gmail.com
Sat Apr 21 01:56:28 EDT 2012


Witeman

在 2012-4-21,上午6:00,Kyle Kingsbury <aphyr at aphyr.com> 写道:

> On 04/20/2012 02:47 PM, Paul Gross wrote:
>> Conceptually, a bucket is a list of documents. So you could handle lists
>> and sets in a similar fashion during a network partition. Each side
>> would see its own list, be able to add and remove. When the network was
>> restored, it could merge the lists (with deletes). There are definitely
>> intricacies to work through, but I think it's possible.
> 
> Any deletes issued during this time would be lost.
> 
> Case 1: Duplicate delivery/deletion on both sides of the partition.
> Case 2: Deletion on one side only, restored by read repair after partition resolved, delivered *again*.
> 
> It gets weirder when you start considering deleted versioned tombstones.
> 
> This is the classic shared-nothing queue conundrum: you can guarantee at-most-once delivery, or at-least-once-delivery, but not both.
> 
> OK, you say: so we apply CRDTs, and use an observed-removed set across a bucket. That's doable, right? Both sides flag a given value as removed, and reconcile cleanly. Well yes, but now you require unbounded space, and surprise, we have a very disappointing proof that garbage-collecting this class of CRDT requires total coordination among participants. Well shit.
> 
> OK, so how about Statebox? We use timestamps to ameliorate the GC problem so long as a given time window. Our hosts are running NTP so it's all cool, ya? Wrong. One of your hosts is not running NTP. Clock desync issues are fucking *ubiquitous*, sadly, and you have to be willing to accept, say, losing all conflicting writes from a client under some clock skew circumstances. Since you're talking about a queue, conflicts are *almost always guaranteed*.
how about use one box for one or some buckets?then in these buckets timestamp are meaningful. If we want another backup box for this then just NTP within them. Once the main box down, the backup up, since they are always NTP, so the time discrepancy between them should be compensated by the period time between down and up.
> 
> Seriously, if you or anyone has a good distributed HA queue, please let me know. Better yet, write a paper, found a startup, and become insanely fucking rich. Everyone needs this, but it's a very hard problem. I mean, +1 for Basho solving it and all, but it's not exactly trivial.
> 
>>>> MongoDB also has capped collections which keep a fixed number of
>>>> documents.
>>> 
>>> Strictly speaking, they don't.
>>> 
>> Can you please elaborate?
> 
> Capped collections will delete records when they are out of space. Don't get me started on Mongo's behavior around replicas during partitions.
> 
> --Kyle
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list