Limiting the number of entries in a set

Shawn Debnath shawn at
Sat Jan 31 18:27:03 EST 2015

Thanks for clarifying this aspect Russell. Given the fact that this is per 
element op and not for the whole set, I think it would work for our 
scenario: each actor posting a new element in the set  will try to trim 
the excess elements (> 1000) from the set and add its own. In this case, 
Riak will potentially see multiple deletes on the same set of  element(s) 
or possibly deletes on an already deleted elements. Since we are removing 
from the tail and adding to the head, if there are greater than 1000 
actors concurrently adding and removing, we can expect that the set will 
be greater than 1000 but should trim itself down eventually.  That is, we 
never have the case of someone trying to delete and someone trying to add 
the same key. Works for us for now (TM) :)

The 1000 entry limit meets our business scenarios and also falls far short 
of 1-2 MB per object recommendation for Riak, so we can take the hit on 
peak loads.

Would be nice if Riak automatically enforced the automatic trim feature 
though in the system. Will look at those bounded CRDT papers soon.


On 1/31/15, 11:47 AM, "Russell Brown" <russell.brown at> wrote:

>Hi Shawn,
>If I can get in IRC later I will.
>On 31 Jan 2015, at 18:11, Shawn Debnath <shawn at> wrote:
>> Good morning,
>> Wondering if there is a better way to do truncation of a set if entries 
>>exceed a certain threshold. I am trying to create a canonical timeline 
>>for every user in our system, set of UUID strings. We want the number of 
>>entries to be limited to about 1000 per timeline to keep the size of 
>>each KV object in check inside Riak and also not store unnecessary data. 
>>From looking around code and docs, there doesn’t seem to be a Redis 
>>style LTRIM option available. So is the only way to enforce this is to 
>>fetch the set, check count, trim it locally and then update?
>At the moment, yes that is correct: no LTRIM, you need to fetch and send 
>back a {remove_all, [elements, to, remove]} type operation (that was 
>pseudo code.)
>> My fear is that by using CRDTs, a set add wins over remove in the case 
>>of a conflict. If there were to be 1000 conflicting concurrent updates 
>>each trying to trim and add one entry to a set that already contains 
>>1000 entries, I imagine the  final count of the set would be 2000. This 
>>can be greatly exasperated on a very “friendly” or popular person’s 
>>timeline and the code never being able to truncate the set back to 1000.
>That is not how the Set that we ship with riak 2.0 works. You don’t fetch 
>the object, mutate it, and store it. Instead you fetch a context and the 
>values. You then send operations to riak. If concurrent actors fetch the 
>set and remove the same, or some intersection, or even disjoint set of 
>elements, whilst adding others, all those operations are executed. The 
>“add wins” applies only to a per-element conflict: client A adds element 
>X while client B removes element X. In this case, add wins and X is in 
>the set. However, you are correct, if 1000 actors concurrently remove 
>every element from a set with 1000 elements, and each add there own 
>unique element, the set will have 1k entries again. There is no way, at 
>the moment in Riak, to enforce an invariant, or bound, on the size of the 
>There is some recent academic work by Valter Balegas, and Nuno Pregucia, 
>(and I am sure others), on this subject, where they created bounded 
>CRDTs, but I can’t find the reference at the moment. It’s something we’d 
>like in Riak, in future.
>If you want a strong invariant on the size of a set, you could use strong 
>consistency in Riak, maybe?
>> Any thoughts or suggestions appreciated.
>> Thanks,
>> Shawn
>> freenode (sde)
>> _______________________________________________
>> riak-users mailing list
>> riak-users at

More information about the riak-users mailing list