performance impact of many buckets?

Dan Reverri reverri at
Thu Oct 8 11:27:53 EDT 2009

Hi Bryan,

"This way, you could sort tweet links in chronological
order, and choose time-ranges of them before actually requesting the
objects from Riak."
Would you mind giving an example of the link ref that would support this? Is
it possible to use the link walker to filter and sort a complex link tag?


On Thu, Oct 8, 2009 at 6:51 AM, Bryan Fink <bryan at> wrote:

> On Wed, Oct 7, 2009 at 11:18 PM, Brian Hammond <brian at>
> wrote:
> > I'm considering writing a Twitter clone (the post-modern "Hello World" of
> > nosql) as a means to learn the ins and outs of Riak.
> Excellent choice.  I did the same with a very early version of Riak,
> and I definitely learned some things in the process.
> > Here's some off-the-top-of-my-head ideas on how to design something like
> (an
> > incomplete) Twitter.  Please comment on what is and isn't a good idea
> > design-wise due to performance implications or feature usage or lack
> > thereof.
> >
> > Alright, let's just start with Users.
> >
> > Either:
> > 1) each user is a bucket (/jiak/brian); or
> > 2) each user is a document in the 'users' bucket (/jiak/users/brian).
> >
> > Thoughts?  Any implications of having a "large" number of buckets, or
> having
> > a "large" number of documents/keys in a single bucket?  Any general
> design
> > guidelines here?
> I'd suggest making each user a document in the 'users' bucket.  Riak
> can support "large" numbers of buckets, in certain situations
> (native-erlang client, default bucket parameters), but when using the
> HTTP interface ("Jiak"), using many buckets will cause the cluster's
> ringstate to grow.  A very large ringstate isn't necessarily a
> problem, but could have performance implications.  We have plans to
> move the bucket metadata out of the cluster's ringstate at some point,
> but we haven't done it yet.
> There is absolutely no problem in Riak with a "large" number of
> documents/keys in a single bucket.
> > Following and Followers.  User A follows B, C and is followed by D, E.  I
> > suppose this could be links in the user's document.  Perhaps the links
> would
> > be to the other user's documents and the link tag per link either
> > 'following' or 'follower'.  Thoughts?
> Sounds fantastic to me.  Excellent use of link tags, imho.
> > Tweets.  Either links in the user's document with link tag 'tweet' or
> > perhaps stored in the user's document directly.  Thoughts?
> I think this comes down to the classic fight between normalized and
> denormalized data.  Depending on your common access patterns, one or
> the other may be "better".  I might even recommend a split solution,
> where each tweet is stored in its own document, but also a copy of a
> user's most recent tweets are stored inside that user's document.
> > Feel free to extend this very trivial model if you feel it would better
> > explain certain things about treating Riak well.
> You might also consider using the timestamp of the tweet as the tag of
> its link.  This way, you could sort tweet links in chronological
> order, and choose time-ranges of them before actually requesting the
> objects from Riak.
> -Bryan
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list