Data modeling a write-intensive comment storage cluster

fxmy wang fxmywc at gmail.com
Sat Jan 25 20:16:12 EST 2014


Greetings List,
I'm a new guy who's only got some experience with RMDBs. So please
enlighten me if I'm doing something silly.

So I'm trying to use Riak for storing video comments - small but huge
amount of datas.
Prerequisites:

- One bucket for one video.
- Keys will consist of a timestamp and userID.
- Values will be plain text, contains a short comment and some tags.
 Should not be lager than 10KB.
- Values are seldom modified.
- Write-intensive, some hot videos maybe ~100,000 people watching at the
same time.
- There will be multiple Erlang-pb clients doing writes.

Then here are my questions:
1) To get better writing throughput, is it right to set the w=1?
2) What's the best way to query these comments? In this use case, I don't
need to retrieve all the comments in one bucket, but just the latest few
hundreds comments( if there are so many) based on the time they are posted.

Right now I'm thinking of using line-walking and keeping track of the
latest comment so I can trace backwards to get the latest 500 comments (
for example). And when new comment comes, point the line to the old latest,
then update new latest comment mark.

So in the scenario above, is it possible that after one client has written
on nodeA ,modified the latest-mark and another client on nodeB not yet sees
the change thus points the line to the old comment, resulting a "branch" in
the line?
If this could happen, then what can be done to avoid it? Are there any
better ways to store&query those comments? Any reply is appreciated.

B.R.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140126/313ae4e3/attachment.html>


More information about the riak-users mailing list