Buckets versus Documents - Limits

Jason Tanner jt4websites at googlemail.com
Fri Feb 5 13:19:35 EST 2010

Hi Rusty

Thank you for your very illuminating responses.

I have some more questions but I shall post them as a separate thread.


On 5 February 2010 15:26, Rusty Klophaus <rusty at basho.com> wrote:

> Hi Jason,
> Just got word that some backends (currently only the Innostore-based
> backend, but perhaps more in the future) keep a separate file open per
> bucket. So add that as the third and biggest reason to model your data using
> a reasonably low number of buckets, depending on the backend that you
> choose.
> Best,
> Rusty
> On Thu, Feb 4, 2010 at 11:21 AM, Rusty Klophaus <rusty at basho.com> wrote:
>> Hi Jason,
>> Great questions.
>> For the use case you described, I would recommend having two buckets, one
>> for artists and one for albums, with a link from artist to album, and
>> possibly a link back. In most scenarios you should consider a bucket like a
>> table.
>> There are two main reasons:
>> First, it conforms to the design expected by map/reduce and linkwalking.
>> You can run a map/reduce across all keys in a bucket to operate on all
>> artists or all albums. This is much harder to do if each artist is in a
>> separate bucket. And with linkwalking, you can select the links to visit
>> based on the bucket name. This is only useful when you have a well-known
>> bucket name that is consistent across the data to which you are linking.
>> Second, you can customize a bucket by setting bucket parameters to
>> configure things like:
>> - How many replicas to store (n_val)
>> - Whether to propagate conflicting edits through to the client
>> (allow_mult)
>> - What link function to use (linkfun)
>> - etc.
>> Generally, you want these customizations to apply to all data of the same
>> type. Plus, it's easier to manage these customizations on a smaller number
>> of buckets.
>> That said, the number of buckets is limited only by physical resources
>> ***unless*** you customize the bucket. If you leave the default bucket
>> settings in place, then a bucket takes no additional overhead, allowing you
>> to create millions of buckets. If you customize the bucket, then the
>> bucket's properties are stored in the ringstate (as you noted) so it's a bad
>> idea to have a large number of buckets with non-standard configuration.
>> (Note that it is possible to override the default bucket configuration by
>> setting 'default_bucket_props' in the app.config file.)
>> Hope that helps.
>> Best,
>> Rusty
>> On Thu, Feb 4, 2010 at 10:31 AM, Jason Tanner <jt4websites at googlemail.com
>> > wrote:
>>> Hi,
>>> Lets say I had 100 million albums generated by 5 million artists.
>>> This could be modelled in riak in a number of ways.
>>> For example, having 2 buckets, one for albums, one for artists and
>>> linking documents in the two buckets.
>>> Alternatively, I could have a bucket per artist containing the albums
>>> they created.
>>> Obviously there are other ways to model this as well.
>>> My point, is to try and identify the limitations in Riak with regards to
>>> its design choices so that I in turn can design my stuff with that in mind.
>>> Are there any penalties to consider when having large numbers of buckets
>>> compared to documents in the buckets?
>>> I read somewhere about bucket information being kept in the ringstate,
>>> and although I didn't fully understand the implications of that I kind of
>>> guessed it meant that perhaps having huge numbers of buckets was not a good
>>> idea.
>>> Is this true ? Is there a point at which having a lot of buckets would
>>> actually penalise you in terms of performance ?
>>> Jason
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20100205/0682a5c5/attachment.html>

More information about the riak-users mailing list