Expected vs Actual Bucket Behavior

Curtis Caravone caravone at gmail.com
Wed Jul 21 14:45:00 EDT 2010


Regarding #2, I think bitcask could be modified to support an efficient list
keys by bucket fairly easily, without sacrificing free buckets:

The current bitcask stores record locators (key, file_id, file_offset) in
memory in a big hash table by key (the bitcask key, in Riak's case, is the
Riak {bucket,key} as a binary).

What if the hash table were replaced with an in-memory btree?  A good
implementation shouldn't take more memory than a hash table, and get/put
should still be very fast.  The plus side is that one could then do a range
traversal of the btree to get all keys in a given bucket (assuming the right
comparison function for the btree).  There wouldn't be any additional
overhead of extra file handles, etc. because everything for a vnode would
still be stored in one bitcask instance.

What do you think?

Curtis

On Wed, Jul 21, 2010 at 6:31 AM, Justin Sheehy <justin at basho.com> wrote:

> I think that we are all (myself included) getting two different issues
> a bit mixed up in this discussion:
>
> 1: storing an implicit index of keys in the Riak key/value store
>
> 2: making buckets separate in that a per-bucket operation's
> performance would not be affected by the content of other buckets
>
> The thread started out with a request for #2, but included a
> suggestion to do #1.  These are actually two different topics.
>
> The first issue, implicitly storing a big index of keys, is
> impractical in a distributed key/value storage system that has Riak's
> availability goals.  We are very unlikely to implement this as
> described in the near future.  However, we very much recognize that
> there are many different ways that people would like to find their
> data.  In that light, we are working on multiple different efforts
> that will use the Riak core to provide data storage with more than
> just "simple" key/value access.
>
> The second issue, of isolating buckets, is a much simpler design
> choice and is also a per-backend implementation detail.  We can create
> and provide an alternative bitcask adapter that does this.  It will be
> a real tradeoff: in exchange for buckets not impacting each other as
> much, the system will consume more filehandles, be a bit less
> efficient at rebalancing, and will generally make buckets no longer
> "free".  This is a reasonable tradeoff in either direction for various
> applications, and I support making it available as a choice.  I have
> created a bugzilla entry to track it:
> https://issues.basho.com/show_bug.cgi?id=480
>
> I hope that this helps to clarify the issue.
>
> -Justin
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20100721/f20c160b/attachment.html>


More information about the riak-users mailing list