Expected vs Actual Bucket Behavior

Justin Sheehy justin at basho.com
Wed Jul 21 09:31:06 EDT 2010

I think that we are all (myself included) getting two different issues
a bit mixed up in this discussion:

1: storing an implicit index of keys in the Riak key/value store

2: making buckets separate in that a per-bucket operation's
performance would not be affected by the content of other buckets

The thread started out with a request for #2, but included a
suggestion to do #1.  These are actually two different topics.

The first issue, implicitly storing a big index of keys, is
impractical in a distributed key/value storage system that has Riak's
availability goals.  We are very unlikely to implement this as
described in the near future.  However, we very much recognize that
there are many different ways that people would like to find their
data.  In that light, we are working on multiple different efforts
that will use the Riak core to provide data storage with more than
just "simple" key/value access.

The second issue, of isolating buckets, is a much simpler design
choice and is also a per-backend implementation detail.  We can create
and provide an alternative bitcask adapter that does this.  It will be
a real tradeoff: in exchange for buckets not impacting each other as
much, the system will consume more filehandles, be a bit less
efficient at rebalancing, and will generally make buckets no longer
"free".  This is a reasonable tradeoff in either direction for various
applications, and I support making it available as a choice.  I have
created a bugzilla entry to track it:

I hope that this helps to clarify the issue.


More information about the riak-users mailing list