High volume data series storage and queries

Ciprian Dorin Craciun ciprian.craciun at gmail.com
Wed Aug 10 09:14:21 EDT 2011


On Wed, Aug 10, 2011 at 04:44, Paul O <pcotec at gmail.com> wrote:
>>    I wouldn't store the "data files" inside the embedded DB, but the
>> actual raw readings.
>
> While this would be tempting how would you see something like that? A huge
> db by source? As I said previously, in many cases this would involve a DB
> with around 1 billion records. In fact many such DBs used all at the same
> time and I can constrain the queries so as to never have to consider the
> full amount of data. It seems to me that I'd be giving up a good chance for
> optimization (as premature and evil be it) by storing all data points. And
> if you're suggesting one DB per batch, the batches would be relatively
> reduced in size, wouldn't that create its own set of problems for the DB
> library (opening many files, closing them, etc.) when I can reduce the final
> range query to a sequential traversal of at most 3 x MaxN records?
> Regards,
> Paul
>
    I'm also merging your other reply, as it's related with that.
>
On Wed, Aug 10, 2011 at 04:38, Paul O <pcotec at gmail.com> wrote:
> Jeremiah, with all the feedback indicating Riak to not be such a straight
> fit for the problem even accepting batching compromises, I'll have to
> strongly reconsider RDBMs solutions, though they come with their own
> tradeoffs, as discussed here today. How about storing data using innostore
> instead of bitcask, would this be a "best of both worlds" situation?
> Thanks for these and the previous valuable comments, again,
> Paul


    So the first problem I see with any solution would be the 1
billion records. I would try to find an alternative solution for
"historical-data", like dumping it in flat files, and just keeping in
the database recent data. By recent I don't have a specific timeframe
in mind, but whatever you determine it works in your particular case.

    Now about the solution: I'm not proposing any batching or
partitioning based on either timestamp or source; neither as
individual files on the file system or stored inside the database as
opaque values.

    Instead I suggest to just put every data point in the same "table"
or "database" as described below:
    * the first requirement for the embedded database is to support
lexicographically sorted keys and cursors which can be pointed to
start from a particular (or greater) key;
    * the key of the record would be the binary representation of the
source id padded with 0's to an exact length, concatenated with the
timestamp which follows the same rules;

    Now about your concerns about executing range queries without
considering the whole data is already solved by the database which
doesn't need to look at the whole data but only search for the first
data point matching your query. Actually behind scenes it does
something similar -- but even better -- than what you're proposing
with the MaxN records opaque value.

    Furthermore, to back my claims I've put on Github the old code
which I've used to benchmark different databases -- Riak or other
key-value stores are not included except BerkeleyDB, but I've included
others like Hypertable which could be a good choice -- at the
following address (you can easily extend the "framework" to include
new backends):
        https://github.com/cipriancraciun/sds-benchmark

    Also there is a "so-called" report I've put up at that time in the
same place at -- the second link is the content of the wkipage and
there are also images which GitHub doesn't display in the page:
        https://github.com/cipriancraciun/sds-benchmark/tree/master/results
        https://github.com/cipriancraciun/sds-benchmark/blob/master/results/results.mediawiki

    For the record we've tested up to 100m records on some
data-stores, but on some other (like PostgreSQL and MySQL we've
stopped at 10 million as the insertion rate dropped tremendously).

    My conclusion from this experiment was: any database which is
backed by a tree-like data structure (almost all use B trees or
derivate) will get to a grinding halt in insert speed if the
clustering keys (clientid + timpstamp in this case) don't exhibit any
kind of locality (as is your case). See the quote from my report:
~~~~
* this section applies to Postgres and SQLite, as MonetDB behaved Ok;
* it seems that the initial insert speed is good for the first couple
million records;
* as the data accumulates and new inserts are done, the indices start
to be rewritten, and consume the entire disk bandwidth; (the initial
good speed is due the fact that the indices fit into the RAM memory;)
* if the inserts are done without any indices defined, the insert
speed is incredible (600k in the case of Postgres), but the scan speed
is under 100;
* maybe, it is possible, that this behavior is specific to any
database which uses trees (like BerkeleyDB?);
~~~~

    Hope this helps you,
    Ciprian.




More information about the riak-users mailing list