Millions of buckets?

Ryan Kennedy rckenned at gmail.com
Fri May 13 00:32:39 EDT 2011


On Wed, May 11, 2011 at 12:25 PM, Jared Morrow <jared at basho.com> wrote:
> It seems like what you are needing is a lot what the Yammer guys needed for
> their streamie application.   They have a video
> here: http://vimeo.com/21598799 about how they modeled their data.   It
> might be pretty helpful for your application.   If not, no harm done, you
> still get to watch a video from some pretty smart people!

There are some differences between what Alexey is describing and what
we built at Yammer. The key difference is per-item read/unread state.
That being said, I don't think it's a deal breaker. And I don't think
you need search.

At Yammer we have a notion of streams (notifications is one of our
streams). Each stream has a list of stream items. For instance, "Bob
liked your message" or "Jenny replied to your message" or "Charlie
mentioned you in a thread". Each stream item has a uniquely generated,
monotonically increasing ID. That's great, that gives us something to
sort and dedupe on. We store the stream items for a user in a single
key/value. Each stream type has it's own bucket. To get to my
notifications, I would fetch /riak/notifications/ryan. To keep things
simple (and bounded) we only store the most recent 1,000 or so stream
items for each user. Older notifications age out of the system as
newer ones replace them. That's fine…for nearly all of our users 1,000
notifications would represent a significant amount of calendar time.
More than they could be expected to page back through.

In addition, we support the notion of a cursor. A cursor is simply a
pointer into a stream. We use the cursor to indicate the last seen
stream item. We have a single bucket for cursors. To get my default
cursor, I would fetch /riak/cursors/ryan-default. The value of that
key is the ID of an item in my notifications stream. This is where
your requirements and ours diverge a bit: we don't have per-item
seen/unseen state.

That being said, you could take the basics of our design and add
per-item seen/unseen state. Ditch our cursors and add a "seen" field
to each stream item. The one problem you're going to have is the
eventual consistency model, especially if you want to support the
ability for users to once again mark something as unseen/unread. In
that case, if you ever encounter sibling values, you may not be able
to reliably merge them. If in one sibling value you see a notification
as read and another as unread, you can't tell which was the last
action taken by the user. Not allowing users to mark something as
unread once again should simplify that problem (if either sibling
value is read, then the notification is read since you can't go back).
Alternatively, consider using a cursor like we are. The write is much
smaller and you don't have to read first to perform the update.

Hopefully that sheds a little more light on what we're doing at
Yammer. As Jared pointed out, the video from our talk is online. @coda
even cracks a few good jokes.

Good luck!

Ryan Kennedy
Infrastructure Engineer @ Yammer




More information about the riak-users mailing list