High volume data series storage and queries

Paul O pcotec at gmail.com
Mon Aug 8 14:21:23 EDT 2011


Hello Riak enthusiasts,

I am trying to design a solution for storing time series data coming from a
very large number of potential high-frequency sources.

I thought Riak could be of help, though based on what I read about it I
can't use it without some other layer on top of it.

The problem is I need to be able to do range queries over this data, by the
source. Hence, I want to be able to say "give me the N first data points for
source S between time T1 and time T2."

I need to store this data for a rather long time, and the expected volume
should grow more than what a "vanilla" RDBMS would support.

Another thing to note is that I can restrict the number of data points to be
returned by a query, so no query would return more than MaxN data points.

I thought about doing this the following way:

1. bundle date time series in batches of MaxN, to ensure that any query
would require reading at most two batches. The batches would be store inside
Riak.
2. Store the start-time, end-time, size and Riak batch ID in a MySQL (or
PostgreSQL) DB.

My thinking is such a strategy would allow me to persist data in Riak and
linearly grow with the data, and the index would be kept in a RDBM for fast
range queries.

Does it sound sensible to use Riak this way? Does this make you
laugh/cry/shake your head in disbelief? Am I overlooking something from Riak
which would make all this much better?

Thanks and best regards,

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110808/99331672/attachment.html>


More information about the riak-users mailing list