High volume data series storage and queries
pcotec at gmail.com
Mon Aug 8 14:21:23 EDT 2011
Hello Riak enthusiasts,
I am trying to design a solution for storing time series data coming from a
very large number of potential high-frequency sources.
I thought Riak could be of help, though based on what I read about it I
can't use it without some other layer on top of it.
The problem is I need to be able to do range queries over this data, by the
source. Hence, I want to be able to say "give me the N first data points for
source S between time T1 and time T2."
I need to store this data for a rather long time, and the expected volume
should grow more than what a "vanilla" RDBMS would support.
Another thing to note is that I can restrict the number of data points to be
returned by a query, so no query would return more than MaxN data points.
I thought about doing this the following way:
1. bundle date time series in batches of MaxN, to ensure that any query
would require reading at most two batches. The batches would be store inside
2. Store the start-time, end-time, size and Riak batch ID in a MySQL (or
My thinking is such a strategy would allow me to persist data in Riak and
linearly grow with the data, and the index would be kept in a RDBM for fast
Does it sound sensible to use Riak this way? Does this make you
laugh/cry/shake your head in disbelief? Am I overlooking something from Riak
which would make all this much better?
Thanks and best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users