Looking for a replacement datastore
siculars at gmail.com
Fri Aug 17 02:46:14 EDT 2012
tl;dr use Riak and Redis. Could you do it without Redis? Probably. Would I want to? No.
I'll take a stab at this. It goes without saying that there are many ways to do this and no "right" way. Each solution will have its own positives and negatives. It all depends on what you and your team are comfortable with and the needs of your app.
For those that follow my ramblings you can guess what I'm gonna say. I would put forward a solution of Riak and... Redis! Why Redis? Data structures (Riak doesn't have them... at the moment... or ever? Don't try to make it have them). You want them. Things like sorted sets, lists and hashes (which compress) are great for basically everything.
Things in your favor:
-constrained data set (data is not UGC with unbounded size)
-predictable growth rates
-deterministic keys (think %iso8601 date or unix epoch int%_%customerid%)
Keep 48 hours of live data in Redis. Run a culling process that dumps data to Riak. The culling process will keep your Redis memory footprint within known limits. You could run this every minute to minimize any data loss from downed Redis servers (outside of master/slave etc.). So like here is where you could make do without Redis. If your app is holding on to, writing or requesting data every minute you could write straight to Riak and just have worker processes roll those minutes into hours/days whatever if necessary.
With deterministic keys you may not even need search, secondary indexes or key filters but with them you can basically cover any permutation you could come up with. Your application handles fetching the correct key(s) in a deterministic fashion simply by manipulating date offsets.
Whatever you do, you do not want a situation where you write half baked keys into Riak. Frequently updating keys will incur file compaction which will make you want to cry and punch babies just to make the pain stop.
On Aug 16, 2012, at 6:20 PM, Shawn Parrish wrote:
> Howdy Riak folk,
> We're looking for a possible datastore replacement for our server
> monitoring check results. Maybe some of you can offer feedback if
> Riak is a possible good solution.
> Each ping, http request, etc has a result with various metadata that
> we store. We're looking at about 250 million results a month and that
> number continues to grow.
> We query this data for:
> 1. last result (is the server up or down?)
> 2. if it's up, when was the last 'down' and inversely when it's down,
> when was the last up?
> 3. Full detail of the last 5 results (to show recent results)
> 4. Last 24 hours results (usually ~1440 results) to graph
> 5. Results in a date range (example: all results from July 1 through
> July 31)... this can be very large.
> We currently use bigcouch (Couchdb) but the views and built in
> _all_docs slow down with so many results and especially when we call
> them with 'include_docs', cause we need the details of the results as
> We're trying to trim down the total results stored by summarizing
> older data and deleting it but that slows down Couchdb views even
> 1. Is Riak a possible datastore for this use case? Can I get so many
> results, including all the details quickly enough?
> 2. Do you know of another datastore that might be better?
> riak-users mailing list
> riak-users at lists.basho.com
More information about the riak-users