Using Riak for time-series data

Sean Cribbs sean at basho.com
Tue Jan 29 19:00:22 EST 2013


Hi Boris,

A number of Basho customers and open-source users of Riak have done
time-series type of data. The main strategy/solution is to "timebox"
data into chunks based on a time window and possibly some other
dimension. For an example, see
http://boundary.com/blog/2012/08/21/boundary-techtalk-large-scale-olap-with-kobayashi/.
The caveat of course, is that you can probably have only one writer
for each chunk (you want to avoid conflicts since the data is
immutable), so this could become a bottleneck. On the other hand,
there are simple ways to distribute that work if your values have
bounds on some other dimension as well.

On the querying side, for best performance you'll want to make sure
that the data you put in is named such that it is straightforward to
request it directly as keys (i.e. without MapReduce or Search). For
cross-correlation, you might want to consider exporting to a search
engine (Riak Search might be able to help) or a batch-processing
system like Hadoop, depending on your needs. However, when you
structure your keyspace correctly, you can do a lot with a little
application code, as Boundary has demonstrated with their Kobayashi
system.  I think you'll find that while IO-heavy (as any database is)
fetching multiple keys in parallel is the most efficient query
mechanism.


On Tue, Jan 29, 2013 at 3:53 PM, Boris Solovyov
<boris.solovyov at gmail.com> wrote:
> Is Riak good for time-series data like stock trade information? Data I
> consider is:
>
> Millions, maybe billions or more of series, each has name, like stock ticker
> symbol
> One-per-second measurement (occasionally will have missing values, rarely)
> Many measurement are zero or static, unchanging
> Append-only, never update, never write in past
> Data probably high compressible, good if it can be compress well
>
> Query requirements are probably typical, for graphing and analyze,
>
> Get all measurements for [serie1,serie2,...serieN] for a time range
> Find series with similar names, maybe with wildcards matching
> Get all measurements aggregated for similar series, e.g. get a time range of
> two series, but add their measurements together to produce a single series
> of output
>
> In reading on Riak, seems to have many nice feature not needed for this
> purpose, and maybe too costly for some of suggested operations. Too much IO,
> maybe too expensive to add two serie together, etc. Also, features like
> conflict detection, versioning, extra metadata etc not needed. And if Riak
> store timestamp with every value, will reduce compressibility, i.e. no need
> store timestamps with every measure if serie is just long set of 0's.
>
> If Riak is not right, what suggestions do you have?
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Sean Cribbs <sean at basho.com>
Software Engineer
Basho Technologies, Inc.
http://basho.com/




More information about the riak-users mailing list