How do I improve Level DB performance?

Tim Haines tmhaines at
Thu May 10 18:19:31 EDT 2012

Hi David,

I think I saw your name as one of the creators of the benchmarking tool -
thanks - compared to the efforts required to benchmark other systems, it's
been great.

On Thu, May 10, 2012 at 3:01 PM, David Smith <dizzyd at> wrote:

> On Thu, May 10, 2012 at 2:33 PM, Tim Haines <tmhaines at> wrote:
> > I've set up a new cluster, and have been doing pre-deployment benchmarks
> on
> > it. The benchmark I was running slowly sunk from 1000 TPS to 250 TPS over
> > the course of the single 8 hour benchmark doing 1 read+1 update using 1k
> > values.  I'm wondering if anyone might have suggestions on how I can
> improve
> > this.
> Generally, this suggests that you are becoming seek-time bound. The
> test config, as specified, will generate a pretty huge number of
> not_founds which are (currently) crazy expensive w/ LevelDB,
> particularly as the dataset grows.
> Assuming you start with an empty database, a sample of this test will
> generate operations like so:
> Key 1000 - get -> not_found
> Key 1001 - update -> not_found + write
> Key 1002 - get -> not_found
> etc..
> I.e. the leveldb cache never gets a chance to be useful, because
> you're always writing new values and the cost of writing each new
> value goes up, since you have to thrash the cache to determine if
> you're ever seen the key that doesn't exist. :)
> The root problem here is going to be the key_generator --
> partitioned_sequential_int will just run through all the ints in order
> and never revisit a key.
Okay, I understand what you're saying here.  There's no cache hits.  I
thought about the tests I ran after the 8 hour benchmark, and although I
started the same test for a shorter duration again, it's likely that the
objects still weren't in the cache even though the data may have existed
because I'd already iterated over so many objects.

I'm assuming the benchmarking tool will use the same bucket for each run,
and the keys will grow in sequence.   So if I run through the benchmark
writing out 100,000 keys, and then rerun the benchmark with 3 reads per one
update, I should see results where the read is successfully being filled by
the cache right?

I'll try this out.  Even though this is occurring would you have expected
to see the performance drop from 500 to 250 over the course of the

> >         {write_buffer_size, 16777216},
> >         {max_open_files, 100},
> >         {block_size, 262144},
> >         {cache_size, 168430088}
> I strongly recommend not changing write_buffer_size; it can have
> unexpected latency side-effects when LevelDB compaction occurs.
> Smaller == more predictable.
Thanks for this - I will revert it back.  One thing I haven't been able to
locate is what settings can be changed when the node has data in it.  Is
this documented anywhere?

Does that help?
You've improved my understanding and given me something else to try.  Thank


> D.
> --
> Dave Smith
> VP, Engineering
> Basho Technologies, Inc.
> dizzyd at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list