Data modeling a write-intensive comment storage cluster

Jeremiah Peschka jeremiah.peschka at gmail.com
Sat Jan 25 23:06:34 EST 2014


Responses inline

---
sent from a tiny portion of the hive mind...
in this case, a phone
On Jan 25, 2014 5:16 PM, "fxmy wang" <fxmywc at gmail.com> wrote:
>
> Greetings List,
>
> I'm a new guy who's only got some experience with RMDBs. So please
enlighten me if I'm doing something silly.
>
> So I'm trying to use Riak for storing video comments - small but huge
amount of datas.
> Prerequisites:
>
> - One bucket for one video.

As long as you keep a list of all videos elsewhere, this should be good.
The new CRDTs in Riak 2.0 should work well for keeping a list of all videos.

> - Keys will consist of a timestamp and userID.
> - Values will be plain text, contains a short comment and some tags.
>  Should not be lager than 10KB.
> - Values are seldom modified.
> - Write-intensive, some hot videos maybe ~100,000 people watching at the
same time.
> - There will be multiple Erlang-pb clients doing writes.
>
> Then here are my questions:
> 1) To get better writing throughput, is it right to set the w=1?

This will improve perceived throughput at the client, but it won't improve
throughput at the server.

> 2) What's the best way to query these comments? In this use case, I don't
need to retrieve all the comments in one bucket, but just the latest few
hundreds comments( if there are so many) based on the time they are posted.
>
> Right now I'm thinking of using line-walking and keeping track of the
latest comment so I can trace backwards to get the latest 500 comments (
for example). And when new comment comes, point the line to the old latest,
then update new latest comment mark.
>

I wouldn't use link-walking. IIRC this uses MapReduce under the covers. You
could use a single key to store the most recent comment.

You can get the most recent n keys using secondary index queries on the
$bucket index, sorting, and pagination.

> So in the scenario above, is it possible that after one client has
written on nodeA ,modified the latest-mark and another client on nodeB not
yet sees the change thus points the line to the old comment, resulting a
"branch" in the line?
> If this could happen, then what can be done to avoid it? Are there any
better ways to store&query those comments? Any reply is appreciated.

You can avoid siblings by serializing all of your writes through a single
writer. That's not a great idea since you lose many of Riak's benefits.

You could also use a CRDT with a register type. These tend toward the last
writer.

The point is that you need to decide how you want to deal with this type of
scenario - it's going to happen. In a worst case; you lose a write briefly.

>
> B.R.
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140125/dbdcf171/attachment.html>


More information about the riak-users mailing list