innostore performance tuning

Jeremiah Peschka jeremiah.peschka at gmail.com
Tue Aug 30 13:09:23 EDT 2011


<after_school_special_voice>I never learned to read!</after_school_special_voice>

For performance metrics, you could see if there are any dtrace points for InnoDB. I wouldn't be able to tell you how to write them, but you'd want to look for things like page splits writes/second, file growth operations. That'll tell you what kind of storage operations are causing your performance slow down. There's an Oracle blog that looks like it contains some basics on writing this sort of thing: http://blogs.oracle.com/realneel/entry/inniostat_innodb_io_statistics

The other obvious options to switch up are adding noatime to skip the file access time writes and switching innostore to use dynamic format tables to avoid the large write issues. I think InnoDB uses 16K pages by default, so 8K objects are going to leave a lot of empty space in the data file leading to increased I/O. The write performance on previously seen objects is probably coincidence because InnoDB doesn't have to grow the data files and isn't performing page splits.

With very sparse page population (large values), you're going to have to update a lot of data pages when you insert a single row since every leaf level page contains pointers to the next and previous page. Worst case scenario - writing a single record could lead to four updates - data page, intermediate page, and next and previous page. So, 8k of data could result in 32k of write. With a 16k data page, a 9k write becomes a 64k write because of updates to other pages. 

Also, if you can put the log files on separate disks than your data files (at least for this load process) then you'll get much better throughput (less I/O contention, etc blah blah blah DBA blah blah).
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Aug 30, 2011, at 9:55 AM, Jonathan Langevin wrote:

> He's already running Raid 10
> 
> 
> Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlangevin at loomlearning.com - www.loomlearning.com - Skype: intel352
> 
> 
> 
> On Tue, Aug 30, 2011 at 12:51 PM, Jeremiah Peschka <jeremiah.peschka at gmail.com> wrote:
> InnoStore is going to insert the data in key order. When you attempt to insert a record that would fit in between two keys (inserting "apple" between "aardvark" and "Byzantium") you're probably going to get a page split, just like in an RDBMS. The data needs to be re-shuffled in order to write it in key order. Despite using MVCC, InnoDB is an index ordered table. Data is written in key order. LevelDB shouldn't have the ordered write issues that InnoStore has.
> 
> The procedure to bulk load your data is going to be the same as bulk loading data for any large Data Warehouse - ordered inserts will help you.
> 
> If there's a way to load your data in smaller ordered chunks, that's going to help you out too.
> 
> Some other options would be to increase the buffer_pool_size for InnoStore. InnoDB will use RAM as a disk cache to avoid disk hits. This would be a "Good Thing". Since you're writing in 8k chunks, roughly, you could also increase the page size. This would be something to experiment with, but increasing database page size could improve write performance. Odds are if your average record size is 8kb, then you're writing multiple pages per record (key + value), which is causing even more I/O (two or more page reads per object). Some filesystem tuning could be in order. SQL Server, for instance, performs sequential reads in 64k chunks. The best practice for disk performance there is to format NTFS to use 64k blocks. The throughput between the defaults and 64k reads is amazing.
> 
> Also, you may want to look at your storage configuration in the back end. If you have that much data, are you using a SAN or DAS? Both of these can help get additional write performance, especially if you adjust the storage device to cache writes in a battery backed cache. What kind of drive configuration do you have? You can get tremendous write performance improvements by moving to RAID10.
> 
> ---
> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
> Microsoft SQL Server MVP
> 
> On Aug 30, 2011, at 9:34 AM, David Koblas wrote:
> 
> > Yes - but the thought of sorting 800M records which are all about 8k in size is a little daunting...  Something like a 6TB sort...  Plus it doesn't answer the ongoing insert problem, which is 20 keys/sec isn't functional.
> >
> > --david
> >
> > On 8/30/11 9:27 AM, Kresten Krab Thorup wrote:
> >> If you can insert the objects in ascending key order, then innostore will be much faster than a random insert.
> >>
> >> Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
> >> Trifork A/S   |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 8787  |  www.trifork.com<http://www.trifork.com/>
> >>
> >> Trifork organizes the world class conference on software development: GOTO Aarhus<http://www.gotocon.com/>  - check it out!
> >>
> >> [cid:part1.09040606.08080401 at trifork.com]
> >>
> >> On Aug 30, 2011, at 6:14 PM, David Koblas wrote:
> >>
> >> I'm currently working on importing a very large dataset (800M) into Riak and running into some serious performance problems.  Hopefully this is just configuration issues and nothing deeper...
> >>
> >> Hardware -
> >>   * 8 proc box
> >>   * 32 Gb ram
> >>   * 5TB disk - RAID10
> >>
> >> Have a cluster of 4 for these boxes all running riak - riak configuration options that are different from stock:
> >>
> >>   * Listening on all IP address "0.0.0.0"
> >>   * {storage_backend, riak_kv_innostore_backend},
> >>   * innostore section - {buffer_pool_size, 17179869184}, %% 16GB
> >>   * innostore section - {flush_method, "O_DIRECT"}
> >>
> >> What I see is that the performance of my import script runs at about 200...300 keys per/second for keys that it's seen recently (e.g. re-runs) then drops to 20ish keys per/sec for new keys.
> >> STATS: 1000 keys handled in 3 seconds 250.75 keys/sec
> >> STATS: 1000 keys handled in 3 seconds 258.20 keys/sec
> >> STATS: 1000 keys handled in 4 seconds 240.11 keys/sec
> >> STATS: 1000 keys handled in 5 seconds 177.63 keys/sec
> >> STATS: 1000 keys handled in 4 seconds 246.26 keys/sec
> >> STATS: 1000 keys handled in 5 seconds 184.79 keys/sec
> >> STATS: 1000 keys handled in 5 seconds 195.95 keys/sec
> >> STATS: 1000 keys handled in 47 seconds 21.02 keys/sec
> >> STATS: 1000 keys handled in 44 seconds 22.63 keys/sec
> >> STATS: 1000 keys handled in 42 seconds 23.64 keys/sec
> >> STATS: 1000 keys handled in 43 seconds 22.88 keys/sec
> >> STATS: 1000 keys handled in 45 seconds 22.12 keys/sec
> >> STATS: 1000 keys handled in 43 seconds 22.83 keys/sec
> >> STATS: 1000 keys handled in 43 seconds 23.11 keys/sec
> >> Of course with 800M records to import a performance of 20 keys/sec is not useful, plus as time goes on having an insert rate at that level is going to be problematic.
> >>
> >> Questions -
> >>   Is there additional things to change for imports and datasets on this scale?
> >>   Is there a way to get additional debugging to see where the performance issues are?
> >>
> >> Thanks,
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>
> >>
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 





More information about the riak-users mailing list