Expected vs Actual Bucket Behavior

Daniel Einspanjer deinspanjer at mozilla.com
Wed Jul 21 02:22:09 EDT 2010

  On 7/20/10 6:00 PM, Eric Filson wrote:
> On Tue, Jul 20, 2010 at 3:02 PM, Justin Sheehy <justin at basho.com 
> <mailto:justin at basho.com>> wrote:
>     Hi, Eric!  Thanks for your thoughts.
>     On Tue, Jul 20, 2010 at 12:39 PM, Eric Filson <efilson at gmail.com
>     <mailto:efilson at gmail.com>> wrote:
>     > I would think that this requirement,
>     > retrieving all objects in a bucket, to be a _very_ common
>     > place occurrence for modern web development and perhaps
>     (depending on
>     > requirements) _the_ most common function aside from retrieving a
>     single k/v
>     > pair.
>     I tend to see people that mostly try to write applications that don't
>     select everything from a whole bucket/table/whatever as a very
>     frequent occurrence, but different people have different requirements.
>      Certainly, it is sometimes unavoidable.
> Indeed, in my case it is :(
I've had two use cases that bumped into this limitation.  In one, we are 
just working around / accepting the limitation.  In the other, we found 
it much easier/safer to consider a different solution entirely.
>     > I might recommend a hybrid
>     > solution (based in my limited knowledge of Riak)... What about
>     allowing a
>     > bucket property named something like "key_index" that points to
>     a key
>     > containing a value of "keys in bucket".  Then, when calling GET
>     > /riak/bucket, Riak would use the key_index to immediately reduce
>     its result
>     > set before applying m/r funcs.  While I understand this is
>     essentially what
>     > a developer would do, it would certainly alleviate some code
>     requirements
>     > (application side) as well as make the behavior of retrieving a
>     bucket's
>     > contents more "expected" and efficient.
>     A much earlier incarnation of Riak actually stored bucket keylists
>     explicitly in a fashion somewhat like what you describe.  We removed
>     this as one of our biggest goals is predictable and understandable
>     behavior in a distributed systems sense, and a model like this one
>     turns each write operation into at least two operations.  This isn't
>     just a performance issue, but also adds complexity.  For instance, it
>     is not immediately obvious what should be returned to the client if a
>     data item write succeeds, but the read/write of the index fails?
> Haha, these are the exact reasons I would cite as a developer for 
> using a similar method on Riak's side... without the option of auto 
> bucket indexing it effectively places this double write into the 
> application side where it requires more cycles and more data across 
> the wire.  Instead of doing a single write, from the application side, 
> and allowing Riak to handle this, you have to GET index_key, UPDATE 
> index_key, ADD new_key... So rather than having a single transaction 
> with Riak, you have to have three transactions with Riak + Application 
> functionality.  Inherently, this adds another level of complexity into 
> the application code base for something that could be done more 
> efficiently by the DB engine itself.
> I would think a separate error number and message would suffice as a 
> return error, obviously though, this would require developers being 
> made aware so they can code for the exception.
> Also, this would be optional, if the index_key wasn't set for the 
> bucket then this setup wouldn't be used.  This would at least make the 
> system more flexible to the application requirements and developer 
> preferences.
I understand that there may be people using Riak who either never intend 
to have a huge number of keys in the cluster, or who never intend to try 
to map reduce over a bucket if they do.
I also understand that there are performance and complexity wins to be 
had by eliminating the feature.

That said, I feel it needs to be an optional feature that the engine 
itself provides.  Pushing it out to the client layer severely 
complicates the transaction because it is now two separate REST calls 
rather than something that can be done in a tightly coupled fashion on 
the node servicing the request.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20100721/2b3fb783/attachment.html>

More information about the riak-users mailing list