Buckets versus Documents - Limits

Jason Tanner jt4websites at googlemail.com
Fri Feb 5 13:19:35 EST 2010


Hi Rusty

Thank you for your very illuminating responses.

I have some more questions but I shall post them as a separate thread.

Jason

On 5 February 2010 15:26, Rusty Klophaus <rusty at basho.com> wrote:

> Hi Jason,
>
> Just got word that some backends (currently only the Innostore-based
> backend, but perhaps more in the future) keep a separate file open per
> bucket. So add that as the third and biggest reason to model your data using
> a reasonably low number of buckets, depending on the backend that you
> choose.
>
> Best,
> Rusty
>
>
> On Thu, Feb 4, 2010 at 11:21 AM, Rusty Klophaus <rusty at basho.com> wrote:
>
>> Hi Jason,
>>
>> Great questions.
>>
>> For the use case you described, I would recommend having two buckets, one
>> for artists and one for albums, with a link from artist to album, and
>> possibly a link back. In most scenarios you should consider a bucket like a
>> table.
>>
>> There are two main reasons:
>>
>> First, it conforms to the design expected by map/reduce and linkwalking.
>> You can run a map/reduce across all keys in a bucket to operate on all
>> artists or all albums. This is much harder to do if each artist is in a
>> separate bucket. And with linkwalking, you can select the links to visit
>> based on the bucket name. This is only useful when you have a well-known
>> bucket name that is consistent across the data to which you are linking.
>>
>> Second, you can customize a bucket by setting bucket parameters to
>> configure things like:
>>
>> - How many replicas to store (n_val)
>> - Whether to propagate conflicting edits through to the client
>> (allow_mult)
>> - What link function to use (linkfun)
>> - etc.
>>
>> Generally, you want these customizations to apply to all data of the same
>> type. Plus, it's easier to manage these customizations on a smaller number
>> of buckets.
>>
>> That said, the number of buckets is limited only by physical resources
>> ***unless*** you customize the bucket. If you leave the default bucket
>> settings in place, then a bucket takes no additional overhead, allowing you
>> to create millions of buckets. If you customize the bucket, then the
>> bucket's properties are stored in the ringstate (as you noted) so it's a bad
>> idea to have a large number of buckets with non-standard configuration.
>> (Note that it is possible to override the default bucket configuration by
>> setting 'default_bucket_props' in the app.config file.)
>>
>> Hope that helps.
>>
>> Best,
>> Rusty
>>
>> On Thu, Feb 4, 2010 at 10:31 AM, Jason Tanner <jt4websites at googlemail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> Lets say I had 100 million albums generated by 5 million artists.
>>>
>>> This could be modelled in riak in a number of ways.
>>>
>>> For example, having 2 buckets, one for albums, one for artists and
>>> linking documents in the two buckets.
>>>
>>> Alternatively, I could have a bucket per artist containing the albums
>>> they created.
>>>
>>> Obviously there are other ways to model this as well.
>>>
>>> My point, is to try and identify the limitations in Riak with regards to
>>> its design choices so that I in turn can design my stuff with that in mind.
>>>
>>> Are there any penalties to consider when having large numbers of buckets
>>> compared to documents in the buckets?
>>>
>>> I read somewhere about bucket information being kept in the ringstate,
>>> and although I didn't fully understand the implications of that I kind of
>>> guessed it meant that perhaps having huge numbers of buckets was not a good
>>> idea.
>>>
>>> Is this true ? Is there a point at which having a lot of buckets would
>>> actually penalise you in terms of performance ?
>>>
>>> Jason
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20100205/0682a5c5/attachment.html>


More information about the riak-users mailing list