High volume data series storage and queries

Ciprian Dorin Craciun ciprian.craciun at gmail.com
Wed Aug 10 10:11:00 EDT 2011

On Wed, Aug 10, 2011 at 16:39, Jeremiah Peschka
<jeremiah.peschka at gmail.com> wrote:
> ---
> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>  Microsoft SQL Server MVP
> On Aug 10, 2011, at 6:14 AM, Ciprian Dorin Craciun wrote:
>>    Furthermore, to back my claims I've put on Github the old code
>> which I've used to benchmark different databases -- Riak or other
>> key-value stores are not included except BerkeleyDB, but I've included
>> others like Hypertable which could be a good choice -- at the
>> following address (you can easily extend the "framework" to include
>> new backends):
>>        https://github.com/cipriancraciun/sds-benchmark
>>    Also there is a "so-called" report I've put up at that time in the
>> same place at -- the second link is the content of the wkipage and
>> there are also images which GitHub doesn't display in the page:
>>        https://github.com/cipriancraciun/sds-benchmark/tree/master/results
>>        https://github.com/cipriancraciun/sds-benchmark/blob/master/results/results.mediawiki
>>    For the record we've tested up to 100m records on some
>> data-stores, but on some other (like PostgreSQL and MySQL we've
>> stopped at 10 million as the insertion rate dropped tremendously).
>>    My conclusion from this experiment was: any database which is
>> backed by a tree-like data structure (almost all use B trees or
>> derivate) will get to a grinding halt in insert speed if the
>> clustering keys (clientid + timpstamp in this case) don't exhibit any
>> kind of locality (as is your case). See the quote from my report:
>> ~~~~
>> * this section applies to Postgres and SQLite, as MonetDB behaved Ok;
>> * it seems that the initial insert speed is good for the first couple
>> million records;
>> * as the data accumulates and new inserts are done, the indices start
>> to be rewritten, and consume the entire disk bandwidth; (the initial
>> good speed is due the fact that the indices fit into the RAM memory;)
>> * if the inserts are done without any indices defined, the insert
>> speed is incredible (600k in the case of Postgres), but the scan speed
>> is under 100;
>> * maybe, it is possible, that this behavior is specific to any
>> database which uses trees (like BerkeleyDB?);
>> ~~~~
> There are a lot of items in your benchmark that lead to the performance problems that you saw with RDBMSes. Notably turning fsync off. An RDBMS will constantly stream writes to a write ahead log before committing them to disk. Most RDBMSes do this every minute or so. You can adjust your RDBMS so that when it checkpoints it can either dump all rows to disk as fast as possible or else attempt to determine an optimal rate to write at so that other read operations can continue and only experience minimal I/O blocking from writes.

    I don't remember if I've disabled fsync also for PostreSQL, or
only on SQLite. But the setup I've arrived to isn't at the first
iteration. I've tried the benchmark a few times and kept the setup
that shown the best performance. (For example for BerkeleyDB I've
almost obtained double the performance after correctly setting up the
number of entries per Btree page.) But I admit I didn't had the input
of a professionist DBA.

> With a clustering key of clientid + timestamp, you'll see tremendous B-tree fragmentation on the unique index supporting the clustering key without using some kind of table level partitioning - either as an RDBMS feature in MSSQL Server or using something like partitioned views in PostgreSQL.

    Exactly this is what I've guessed that lead to the performance degradation.

> With any sufficiently complex system (an RDBMS is arguably more complex than most operating systems), you do need to apply some level of domain specific knowledge to the problem and then tune appropriately.

    I tend to agree with the need of in-depth domain specific
knowledge. On the other hand if you apply to much of it it will be
also very hard to change the requirements after the initial system was

    On the other hand for example Hypertable didn't show the
weaknesses of RDBMS as it internally does something similar to
partitioning (but unattended).


More information about the riak-users mailing list