Is Riak right for me?

Sean Cribbs sean at basho.com
Fri Feb 11 09:23:54 EST 2011


Mike,

Sounds like an interesting project. Here are some things to think about (corresponding to your bullet points):

1) What seems fairly natural and obvious for key choice is the timestamp, since so many of your operations are time-oriented.
2) For playback of historical data, consider using MapReduce to grab more than just single seconds of data... maybe 30 at a time.  You could even put some of your preprocessing into the map or reduce phase.  Make sure to generate the key list (since you'll know them) instead of trying to do a full-bucket query with filtering.
3) Beware of race-conditions and the possibility of not all clients seeing the data right away.  This can be somewhat alleviated by using DW=W=quorum when writing, but you're still talking about dogpiling a bunch of requests on the same key. An in-memory write-through cache of the "latest second" might be what you need here.
4) This is another case where you could use MapReduce to crunch the data. 60 items is not very much, so I think you'll have good results here. The internal MapReduce cache will also reduce the pain of multiple computations on the same data.

All in all, I think Riak will be a good fit for your application, with the possible exception of the polling-every-second thing.  A couple of tips to make sure you have a good experience with Riak:

First, benchmark your usage pattern as best you can to make sure that Riak will meet your performance needs.  For example, I might create some basho_bench tests with appropriate key and value generators that have:

a) 1 write per second (the snapshot data)
b) X reads per second (where X is the number of expected clients)
c) 1-5 historical replays per second (via MapReduce)
d) X roll-up reports per minute (X = number of clients again)

I'd then run them concurrently, and in different combinations to simulate the load.

Second, make sure you start with at least 3 nodes (even in your local developer setup).  Because Riak is designed to be distributed, there are certain things that are sub-optimal when the number of nodes is less than the replication factor (N value, default 3).

Let us know if there's anything else we can help you with.

Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Feb 11, 2011, at 8:49 AM, Mike Stoddart wrote:

> Riak is very appealing for several reasons; scalability, durability,
> open-source, performance etc. I'm currently using PostgreSQL for all
> my storage needs, but I'm investigating nosql (can I use that name?)
> solutions for scalability and to experiment with map/reduce
> functionality for statistics and reporting.
> 
> I have a few requirements that nosql solutions might not be able to meet.
> 
> 1) Every second I take a snapshot of my data and store it in the
> database in one record. Each recorded snapshot includes the timestamp
> it was taken.
> 
> 2) I have a playback feature that lets me retrieve historical data.
> During playback, the browser requests a recorded snapshot every
> second:
> 
>   2011-01-01 08:00:00
>   2011-01-01 08:00:01
>   2011-01-01 08:00:02
>   2011-01-01 08:00:03 ...
> 
> Currently it takes less than 75ms for the server to retrieve the data
> from PostgreSQL and to return it to the browser. Some processing is
> done before the response is sent.
> 
> 3) Every second each client's browser requests the current data
> snapshot (i.e. not in playback mode). The same comment for timing and
> processing applies from 2).
> 
> 4) Every minute I retrieve statistics and a report for a specific type
> of data to present on the browser. Currently with PostgreSQL this
> takes about 2-3s for the web server to retrieve the data, process it
> and return it to the browser.
> 
> The only primary key I use is a serial integer, only because that's
> the default. I don't see anything in my data that would be useful as a
> key when using a key/value database. My data is a good fit for storing
> as a 'document' though.
> 
> I know there might not be enough information here but do you think
> Riak is a good fit?
> 
> Thanks
> Mike
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list