Buckets versus Documents - Limits

Rusty Klophaus rusty at basho.com
Thu Feb 4 11:21:58 EST 2010


Hi Jason,

Great questions.

For the use case you described, I would recommend having two buckets, one
for artists and one for albums, with a link from artist to album, and
possibly a link back. In most scenarios you should consider a bucket like a
table.

There are two main reasons:

First, it conforms to the design expected by map/reduce and linkwalking. You
can run a map/reduce across all keys in a bucket to operate on all artists
or all albums. This is much harder to do if each artist is in a separate
bucket. And with linkwalking, you can select the links to visit based on the
bucket name. This is only useful when you have a well-known bucket name that
is consistent across the data to which you are linking.

Second, you can customize a bucket by setting bucket parameters to configure
things like:

- How many replicas to store (n_val)
- Whether to propagate conflicting edits through to the client (allow_mult)
- What link function to use (linkfun)
- etc.

Generally, you want these customizations to apply to all data of the same
type. Plus, it's easier to manage these customizations on a smaller number
of buckets.

That said, the number of buckets is limited only by physical resources
***unless*** you customize the bucket. If you leave the default bucket
settings in place, then a bucket takes no additional overhead, allowing you
to create millions of buckets. If you customize the bucket, then the
bucket's properties are stored in the ringstate (as you noted) so it's a bad
idea to have a large number of buckets with non-standard configuration.
(Note that it is possible to override the default bucket configuration by
setting 'default_bucket_props' in the app.config file.)

Hope that helps.

Best,
Rusty

On Thu, Feb 4, 2010 at 10:31 AM, Jason Tanner <jt4websites at googlemail.com>wrote:

> Hi,
>
> Lets say I had 100 million albums generated by 5 million artists.
>
> This could be modelled in riak in a number of ways.
>
> For example, having 2 buckets, one for albums, one for artists and linking
> documents in the two buckets.
>
> Alternatively, I could have a bucket per artist containing the albums they
> created.
>
> Obviously there are other ways to model this as well.
>
> My point, is to try and identify the limitations in Riak with regards to
> its design choices so that I in turn can design my stuff with that in mind.
>
> Are there any penalties to consider when having large numbers of buckets
> compared to documents in the buckets?
>
> I read somewhere about bucket information being kept in the ringstate, and
> although I didn't fully understand the implications of that I kind of
> guessed it meant that perhaps having huge numbers of buckets was not a good
> idea.
>
> Is this true ? Is there a point at which having a lot of buckets would
> actually penalise you in terms of performance ?
>
> Jason
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20100204/559708c1/attachment.html>


More information about the riak-users mailing list