High volume data series storage and queries

Jeremiah Peschka jeremiah.peschka at gmail.com
Wed Aug 10 10:18:19 EDT 2011


Yeah, the performance attempting to get is where you need to apply either specialized knowledge of a single database system or else you need to apply ample hardware budget.

There are some limitations in storage subsystem, too:

RAID0 is generally regarded as a Bad Idea for RDBMSes. Ideally you'd put your transaction log on at least a mirrored pair of drives. Data files go on RAID 10. OS and other binaries on a third drive that can safely be either mirrored or RAID 5. 

If you really want storage performance, either use SSDs or a battery backed RAID controller with onboard memory. The Adaptec MaxIQ is entry level storage porn: http://www.tomshardware.com/reviews/maxiq-ssd-cache,2511.html You can put an SSD on your RAID controller for your old fashioned rotational drives. ZOOM. Cheaper than a SAN/DAS.

In the end, the best solution is one where you have in house expertise. The second best is one when there are people you trust who will help you build the appropriate system and then train your staff. 
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
 Microsoft SQL Server MVP

On Aug 10, 2011, at 7:11 AM, Ciprian Dorin Craciun wrote:

> On Wed, Aug 10, 2011 at 16:39, Jeremiah Peschka
> <jeremiah.peschka at gmail.com> wrote:
>> 
>> ---
>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>>  Microsoft SQL Server MVP
>> 
>> On Aug 10, 2011, at 6:14 AM, Ciprian Dorin Craciun wrote:
>> 
>>> 
>>>    Furthermore, to back my claims I've put on Github the old code
>>> which I've used to benchmark different databases -- Riak or other
>>> key-value stores are not included except BerkeleyDB, but I've included
>>> others like Hypertable which could be a good choice -- at the
>>> following address (you can easily extend the "framework" to include
>>> new backends):
>>>        https://github.com/cipriancraciun/sds-benchmark
>>> 
>>>    Also there is a "so-called" report I've put up at that time in the
>>> same place at -- the second link is the content of the wkipage and
>>> there are also images which GitHub doesn't display in the page:
>>>        https://github.com/cipriancraciun/sds-benchmark/tree/master/results
>>>        https://github.com/cipriancraciun/sds-benchmark/blob/master/results/results.mediawiki
>>> 
>>>    For the record we've tested up to 100m records on some
>>> data-stores, but on some other (like PostgreSQL and MySQL we've
>>> stopped at 10 million as the insertion rate dropped tremendously).
>>> 
>>>    My conclusion from this experiment was: any database which is
>>> backed by a tree-like data structure (almost all use B trees or
>>> derivate) will get to a grinding halt in insert speed if the
>>> clustering keys (clientid + timpstamp in this case) don't exhibit any
>>> kind of locality (as is your case). See the quote from my report:
>>> ~~~~
>>> * this section applies to Postgres and SQLite, as MonetDB behaved Ok;
>>> * it seems that the initial insert speed is good for the first couple
>>> million records;
>>> * as the data accumulates and new inserts are done, the indices start
>>> to be rewritten, and consume the entire disk bandwidth; (the initial
>>> good speed is due the fact that the indices fit into the RAM memory;)
>>> * if the inserts are done without any indices defined, the insert
>>> speed is incredible (600k in the case of Postgres), but the scan speed
>>> is under 100;
>>> * maybe, it is possible, that this behavior is specific to any
>>> database which uses trees (like BerkeleyDB?);
>>> ~~~~
>> 
>> There are a lot of items in your benchmark that lead to the performance problems that you saw with RDBMSes. Notably turning fsync off. An RDBMS will constantly stream writes to a write ahead log before committing them to disk. Most RDBMSes do this every minute or so. You can adjust your RDBMS so that when it checkpoints it can either dump all rows to disk as fast as possible or else attempt to determine an optimal rate to write at so that other read operations can continue and only experience minimal I/O blocking from writes.
> 
>    I don't remember if I've disabled fsync also for PostreSQL, or
> only on SQLite. But the setup I've arrived to isn't at the first
> iteration. I've tried the benchmark a few times and kept the setup
> that shown the best performance. (For example for BerkeleyDB I've
> almost obtained double the performance after correctly setting up the
> number of entries per Btree page.) But I admit I didn't had the input
> of a professionist DBA.
> 
> 
>> With a clustering key of clientid + timestamp, you'll see tremendous B-tree fragmentation on the unique index supporting the clustering key without using some kind of table level partitioning - either as an RDBMS feature in MSSQL Server or using something like partitioned views in PostgreSQL.
> 
>    Exactly this is what I've guessed that lead to the performance degradation.
> 
> 
>> With any sufficiently complex system (an RDBMS is arguably more complex than most operating systems), you do need to apply some level of domain specific knowledge to the problem and then tune appropriately.
> 
>    I tend to agree with the need of in-depth domain specific
> knowledge. On the other hand if you apply to much of it it will be
> also very hard to change the requirements after the initial system was
> completed.
> 
>    On the other hand for example Hypertable didn't show the
> weaknesses of RDBMS as it internally does something similar to
> partitioning (but unattended).
> 
>    Ciprian.




More information about the riak-users mailing list