Is Riak a good solution for this problem?
songs at maplekeycompany.com
Sun Feb 12 15:37:19 EST 2012
I'm also new to riak and in the middle researching whether riak makes sense
for a new project I'm working on.
Here are my thoughts based on what I've read so far:
- Riak can be pretty fast, but their game is more about rock solid
stability rather than raw performance
- you're going to want a 3-node+ cluster; you'd want multiple nodes for
any m/r setup anyway
- Riak won't be super great for ad-hoc queries so I'm assuming that we're
talking about a canned daily report
- your schema needs to match your query patterns; a 2i of date, query by
specific date then group by session; if you're doing this as described w/o
a 2i date filter first, you'll be going through page view data for the past
which presumably won't change after you've processed it
- m/r using protocol buffers can be significantly faster than using HTTP
so try again with a client that uses pb
- once you have multiple nodes set up, do_prereduce will split the load of
reducing across the nodes; i think this would definitely be useful if
you're data reduces well per node like: for day d, x registered user page
views, y unregistered user page views
- updating shouldn't be a problem as long as it isn't hard for you to
resolve collisions; if there's a collision you have to decide on the
strategy for resolution; since this is log data and not something that
requires transactions, that shouldn't be a problem
Some links that I've found useful:
(especially the comment)
Basho Vimeo channel has a pile of informative videos where you'll find good
nuggets here and there: http://vimeo.com/17604126
Hope that helps,
On Sun, Feb 12, 2012 at 9:00 AM, <riak-users-request at lists.basho.com> wrote:
> Message: 1
> Date: Sun, 12 Feb 2012 11:27:22 +0000
> From: Marco Monteiro <marco at textovirtual.com>
> To: riak-users at lists.basho.com
> Subject: Is Riak a good solution for this problem?
> I'm considering Riak for the statistics of a site that is approaching a
> billion page views per month.
> The plan is to log a little information about each the page view and then
> to query that data.
> I'm very new to Riak. I've gone over the documentation on the wiki, and I
> know about map-reduce,
> secondary indexes and Riak search. I've installed Riak on a single node and
> made a test with the
> default configuration. The results were a little bellow what I expected.
> For the test is used the following
> We want the page view count by day for registered and unregistered users.
> We are storing session
> documents. Each document has a session identifier as it's key and a list of
> page views as the value
> (and a few additional properties we can ignore). This document structure
> comes from CouchDB,
> where I organised things like this to be able to more easily query the
> database. I've done a basic
> k/v in a bucket) returning
> the length of the page views array for either the registered or
> unregistered field (the other is zero), and
> the day of the request. In the reduce I collect them by hashing the day and
> summing the two number
> of page views. Then I have a second reduce to sort the list by day.
> This is very slow on a single machine setup with default Riak
> configuration. 1.000 sessions takes
> 6 seconds. 10.000 sessions takes more that 2 minutes (timeout). We want to
> handle 10.000.000
> sessions, at least. Is there a way, maybe with secondary indexes, to make
> this go faster using only Riak?
> Or must I use some kind of persistent cache to store this info as time goes
> by? Or can I make Riak
> run 100 times faster by tweaking the config? I don't want to have 1000
> machines for making this work.
> Also, will updating the session documents be a problem for Riak? Would it
> be better to store each
> page hit under a new key, to not update the the session document. Because
> of the "multilevel" map
> reduce this ca work on Riak, where it didn't work on CouchDB, because its
> view system limitation.
> Unfortunately, with the update of documents the CouchDB database was
> growing way too fast for it
> to be a feasible solution.
> Any advice to make Riak work for this problem is greatly appreciated.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> riak-users mailing list
> riak-users at lists.basho.com
> End of riak-users Digest, Vol 31, Issue 16
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users