Getting all keys in a bucket
Stephen C. Gilardi
scgilardi at gmail.com
Tue Feb 9 21:17:21 EST 2010
On Feb 9, 2010, at 8:05 PM, Lucas Di Pentima wrote:
> I inserted a few thousands (aprox 12000) records to a Riak embedded installation on my OSX Snow Leopard, and now I tried to get all the keys by using curl and ruby, and it takes a lot of time (some minutes!)
> I suppose 12k keys should not be lots of data, can you tell me why does it took so long? Is there a configuration I should be tweaking?
In some testing we did, we put a bunch of records into riak with random (UUID) keys and needed to list the keys to retrieve them.
Key listing keys appeared to be an O(N^2) operation. In one 3-node cluster of m1-small machines at EC2, a rough formula for how long it took to list the keys in a single bucket was:
time in minutes = (keys in the bucket / 10,000) ^ 2
This held for 10,000, 20,000, 30,000 (1, 4, and 9 minutes).
Asking one node for the list of keys appeared to result in significant CPU time being used only on that node.
To get better key listing performance, I found we could split up the key-value pairs into multiple buckets and then request the keys for the buckets from several threads in parallel. (We were still only asking one node for the lists, but many such requests were pending in parallel.). This appeared to engage many nodes in the task and aggregate performance became quite good.
I'd love to hear more and better ideas for fast key listing.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users