list_keys is less bad
justin at basho.com
Mon Aug 23 21:38:40 EDT 2010
One aspect of Riak's interface that has often been discouraged in the
past is the listing of all keys in a bucket. This has been for two
reasons: the first is that it is necessarily an operation that is more
heavyweight than any of the more targeted get/put/delete sorts of
things, but the second is that due to the priorities of the first many
users of Riak we hadn't really put much optimization into that area.
As a result, anything that required getting all keys from a bucket was
fairly slow and also fairly heavy in terms of memory consumption.
We have put some effort into this recently and seen marked
improvement. The changes can be summed up as:
1- bitcask has a new fold_keys operation, which performs far less I/O
in most cases than the previous mechanism underlying list_keys.
2- the Riak backend interface to bitcask uses the new fold_keys operation.
3- the mechanism underlying the cluster-wide list_keys operation has
changed to require far less total memory in proportion to the list.
Due to these three changes, there are two effective results:
1- In nearly all cases, the list_keys operator is much faster than
before. In some common cases it is 10 times faster.
2- In cases of very large buckets, memory allocation will not spike
during key listing. (though of course if you ask Riak to build the
whole list for you instead of streaming it out, then at least that
much must be used to accommodate)
Note that since map/reduce uses the streaming list_keys under the hood
when performing map/reduce over a whole bucket, these changes affect
that interface's performance as well.
The described changes are now in the trunks of the relevant
repositories, and will be included in the next release.
More information about the riak-users