Data modeling a write-intensive comment storage cluster

Jeremiah Peschka jeremiah.peschka at
Mon Jan 27 09:48:31 EST 2014

Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop

On Sun, Jan 26, 2014 at 10:27 PM, fxmy wang <fxmywc at> wrote:

> Thanks for the response, Jeremiah.
> > > Then here are my questions:
> > > 1) To get better writing throughput, is it right to set the w=1?
> >
> > This will improve perceived throughput at the client, but it won't
> improve throughput at the server.
> Thank you for clarifying this for me :D
> > > 2) What's the best way to query these comments? In this use case, I
> don't need to retrieve all the comments in one bucket, but just the latest
> few hundreds comments( if there are so many) based on the time they are
> posted.
> > >
> > > Right now I'm thinking of using line-walking and keeping track of the
> latest comment so I can trace backwards to get the latest 500 comments (
> for example). And when new comment comes, point the line to the old latest,
> then update new latest comment mark.
> > >
> >
> > I wouldn't use link-walking. IIRC this uses MapReduce under the covers.
> You could use a single key to store the most recent comment.
> What's bad about MapReduce?
> Since there will be another cache layer lays on top of the cluster, so
> the read operation is relatively quite infrequent. That's why I choose
> to use link-walking.

Even when you run a MapReduce query over a single bucket, MapReduce has to
contact a majority of nodes in the cluster to perform a coverage query. In
effect, you're scanning all of the keys to make sure you find only the keys
in a single bucket. MapReduce can work for limited scenarios (e.g. mutating
the state of a large number of objects or running batched analytics that
write to a separate set of buckets/keys) but people have reported
unsatisfactory results when trying to use MapReduce for live querying.

This sort of thing may be possible with the Riak Search 2.0 functionality
as well. I haven't played around with it enough to know whether it would be
a good fit or not.

> > You can get the most recent n keys using secondary index queries on the
> $bucket index, sorting, and pagination.
> I'm not sure what you mean here =.=
> How can I query most recent n keys using 2i ? Should I put timestamp
> -----like by every hour----- in 2i on the coming comments , then when
> it comes to queries, just try to query 2i by the hour segment? This
> seems a little blind because some videos could be long time before got
> commented again.  Querying based on time segmentation seems like
> shooting in the dark to me :\

"Keys will consist of a timestamp and userID."

Sounds like you could sort on that to me.

The $bucket index is a special index that only contains a list of the keys
in a bucket. Querying $bucket is cheaper than a list keys operation.

There are a number of ways you can solve this problem that are all
implementation dependent.

> And doc says listing keys operation should not used in production, so
> it's a no go either :\

A list keys is not a $bucket index query. See "Retrieve all Bucket Keys via
$bucket Index" at

> > > So in the scenario above, is it possible that after one client has
> written on nodeA ,modified the latest-mark and another client on nodeB not
> yet sees the change thus points the line to the old comment, resulting a
> "branch" in the line?
> > > If this could happen, then what can be done to avoid it? Are there any
> better ways to store&query those comments? Any reply is appreciated.
> >
> > You can avoid siblings by serializing all of your writes through a
> single writer. That's not a great idea since you lose many of Riak's
> benefits.
> > You could also use a CRDT with a register type. These tend toward the
> last writer.
> My goal is to form kind of a single-line-relationship based on
> timestamp through the keys under high concurrent write pressure. And
> through this relationship I can easily pick out the last
> hundreds/thousands comments.
> As Jeremiah said, serializing all of writes through a single writer
> can avoid siblings totally. And note that we don't have key clashing
> problems here ------ every comment holds an unique key. What we want
> is single-line-relationship. So how about this:
> Multiple erlang-pb clients just do the writes and don't care about the
> lining up.
> Using post-commit hooks to notify one special global registered
> process( which should be running in the riak cluster?) that "here
> comes a new comment, line it up when it's appropriate".
> Is this feasible? And if it is , how should i prepare for the cluster
> partition & rejoin scenario when network fails?

It sounds to me like you're doing an awful lot of work to do something that
a relational database handles remarkably well.

> > The point is that you need to decide how you want to deal with this type
> of scenario - it's going to happen. In a worst case; you lose a write
> briefly.
> Hopefully the method above could avoid this :)
> Please everyone, share your thoughts please. _(:3JZ)_
> B.R.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list