Getting all keys in a bucket

Daniel Widgren daniel.widgren at gmail.com
Wed Feb 10 05:07:27 EST 2010


Stephen C. Gilardi wrote:
>
> On Feb 9, 2010, at 8:05 PM, Lucas Di Pentima wrote:
>
>> I inserted a few thousands (aprox 12000) records to a Riak embedded 
>> installation on my OSX Snow Leopard, and now I tried to get all the 
>> keys by using curl and ruby, and it takes a lot of time (some minutes!)
>>
>> I suppose 12k keys should not be lots of data, can you tell me why 
>> does it took so long? Is there a configuration I should be tweaking?
>
> In some testing we did, we put a bunch of records into riak with 
> random (UUID) keys and needed to list the keys to retrieve them.
>
> Key listing keys appeared to be an O(N^2) operation. In one 3-node 
> cluster of m1-small machines at EC2, a rough formula for how long it 
> took to list the keys in a single bucket was:
>
>     time in minutes = (keys in the bucket / 10,000) ^ 2
>
> This held for 10,000, 20,000, 30,000 (1, 4, and 9 minutes).
>
> Asking one node for the list of keys appeared to result in significant 
> CPU time being used only on that node.
>
> To get better key listing performance, I found we could split up the 
> key-value pairs into multiple buckets and then request the keys for 
> the buckets from several threads in parallel. (We were still only 
> asking one node for the lists, but many such requests were pending in 
> parallel.). This appeared to engage many nodes in the task and 
> aggregate performance became quite good.
>
> I'd love to hear more and better ideas for fast key listing.
>
> --Steve
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>   

One idea we had when we worked on doing a e-commerce framework using 
nitrogen and riak was to have one object in a bucket that just stored a 
list of keys.

If we had a productbucket with we say 5 products and they had the key 1 
to 5 the list object would have the key list and values [1,2,3,4,5]. 
This helped us getting all keys in a bucket in a fast way. Everytime you 
removed or added a product the list would be updated.

We did try to have 5000 products and it started to go really slow when 
you tried to get every product for it self. It is about 5000 calls to Riak.

Here we had two discussions in the end, one was that we did the same 
for  the key list but with all products. That is if you want to get all 
products.

[{product1, all data for product1}, {product2, all data for product2}, 
{product3, all data for product3}, {product4, all data for product4}, 
{product5, all data for product5}]

This will maybe help if you want all data from a bucket. Like in a 
webshop when you list all products in a category. But not sure if it is 
a good idea.

/Daniel




More information about the riak-users mailing list