Is Riak right for me?
stodge at gmail.com
Fri Feb 11 14:13:49 EST 2011
Cool - some very useful information. Much appreciated. I need to
digest this a bit before replying with more questions. I'm still
trying to understand map/reduce so I'm struggling to get my head
around how to apply it to my requirements. I'm obviously used to
On Fri, Feb 11, 2011 at 9:23 AM, Sean Cribbs <sean at basho.com> wrote:
> Sounds like an interesting project. Here are some things to think about (corresponding to your bullet points):
> 1) What seems fairly natural and obvious for key choice is the timestamp, since so many of your operations are time-oriented.
> 2) For playback of historical data, consider using MapReduce to grab more than just single seconds of data... maybe 30 at a time. You could even put some of your preprocessing into the map or reduce phase. Make sure to generate the key list (since you'll know them) instead of trying to do a full-bucket query with filtering.
> 3) Beware of race-conditions and the possibility of not all clients seeing the data right away. This can be somewhat alleviated by using DW=W=quorum when writing, but you're still talking about dogpiling a bunch of requests on the same key. An in-memory write-through cache of the "latest second" might be what you need here.
> 4) This is another case where you could use MapReduce to crunch the data. 60 items is not very much, so I think you'll have good results here. The internal MapReduce cache will also reduce the pain of multiple computations on the same data.
> All in all, I think Riak will be a good fit for your application, with the possible exception of the polling-every-second thing. A couple of tips to make sure you have a good experience with Riak:
> First, benchmark your usage pattern as best you can to make sure that Riak will meet your performance needs. For example, I might create some basho_bench tests with appropriate key and value generators that have:
> a) 1 write per second (the snapshot data)
> b) X reads per second (where X is the number of expected clients)
> c) 1-5 historical replays per second (via MapReduce)
> d) X roll-up reports per minute (X = number of clients again)
> I'd then run them concurrently, and in different combinations to simulate the load.
> Second, make sure you start with at least 3 nodes (even in your local developer setup). Because Riak is designed to be distributed, there are certain things that are sub-optimal when the number of nodes is less than the replication factor (N value, default 3).
> Let us know if there's anything else we can help you with.
> Sean Cribbs <sean at basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> On Feb 11, 2011, at 8:49 AM, Mike Stoddart wrote:
>> Riak is very appealing for several reasons; scalability, durability,
>> open-source, performance etc. I'm currently using PostgreSQL for all
>> my storage needs, but I'm investigating nosql (can I use that name?)
>> solutions for scalability and to experiment with map/reduce
>> functionality for statistics and reporting.
>> I have a few requirements that nosql solutions might not be able to meet.
>> 1) Every second I take a snapshot of my data and store it in the
>> database in one record. Each recorded snapshot includes the timestamp
>> it was taken.
>> 2) I have a playback feature that lets me retrieve historical data.
>> During playback, the browser requests a recorded snapshot every
>> 2011-01-01 08:00:00
>> 2011-01-01 08:00:01
>> 2011-01-01 08:00:02
>> 2011-01-01 08:00:03 ...
>> Currently it takes less than 75ms for the server to retrieve the data
>> from PostgreSQL and to return it to the browser. Some processing is
>> done before the response is sent.
>> 3) Every second each client's browser requests the current data
>> snapshot (i.e. not in playback mode). The same comment for timing and
>> processing applies from 2).
>> 4) Every minute I retrieve statistics and a report for a specific type
>> of data to present on the browser. Currently with PostgreSQL this
>> takes about 2-3s for the web server to retrieve the data, process it
>> and return it to the browser.
>> The only primary key I use is a serial integer, only because that's
>> the default. I don't see anything in my data that would be useful as a
>> key when using a key/value database. My data is a good fit for storing
>> as a 'document' though.
>> I know there might not be enough information here but do you think
>> Riak is a good fit?
>> riak-users mailing list
>> riak-users at lists.basho.com
More information about the riak-users