Fwd: LevelDB

Alexander Sicular siculars at gmail.com
Tue Aug 21 17:33:49 EDT 2012


I was in the Riak 1.2 webinar earlier today and asked a leveldb question about insertion order and durability vs. bitcask's WOL architecture. Joe was not able to get to my question then but took the time to write me a detailed answer. Great engineers at Basho taking time to answer questions is a great thing. Thanks Joe!

-Alexander Sicular

@siculars

Begin forwarded message:

> From: Joseph Blomstedt <joe at basho.com>
> Subject: LevelDB
> Date: August 21, 2012 3:45:45 PM EDT
> To: siculars at gmail.com
> 
> Alexander,
> 
> I noticed your LevelDB question in the webinar as Reem was closing
> things out, so I figured I'd follow up via email.
> 
> As you know, Bitcask maintains a strict set of write-logs and an
> in-memory hash table that maps keys to (file, offset). Pretty
> straightforward. Compaction is a separate thing that happens based on
> independent triggers.
> 
> LevelDB is rather different. LevelDB does maintain a WAL, but it's
> short-lived and only for crash recovery. LevelDB writes to the WAL,
> but also keeps the object in an in-memory write buffer (configurable
> size, increased in Riak 1.2 by 10x from Riak 1.1). After the buffer
> becomes full, LevelDB writes the data to disk as a Level-0 SST (data
> in sorted order + sorted index at the end of the file).
> 
> There can be multiple Level-0 SSTs. To read a key, LevelDB looks at
> the index in each SST starting from newest file to oldest. For
> performance, there's an LRU cache of indexes so you're not always
> hitting disk. LevelDB now also includes bloom filters (used in Riak
> 1.2) to make it easier to skip non-interesting SSTs.
> 
> To make things more efficient, LevelDB does compaction/merging in a
> background thread. A set of Level-0 files will be selected and merged
> together into a larger Level-1 file. The format is the same, but the
> file is now larger and includes the data from multiple Level-0 files.
> The original Level-0 files are then removed. Likewise, Level-1 files
> are merged into Level-2 files, and Level-2 into Level-3, etc. Each
> Level having larger files with a greater chunk of adjacent, sorted
> data.
> 
> To read, you check newest to oldest on Level 0, then Level 1, then Level 2, etc.
> 
> While compaction is a background thing, LevelDB limits the number of
> Level-0 files you can have. If you hit the limit, LevelDB will block
> writes until files have been merged into Level-1. With a single
> compaction thread, it was easy to max out LevelDB in Riak 1.1, and
> these stalls were fairly frequent and hurt 95% and up latencies, as
> well as greatly hurt throughput. Our change to use multiple compaction
> threads has greatly improved the how quickly compaction occurs, and
> writes rarely (if ever) end up stalling. To further improve things,
> there's the adaptive write throttling that I mentioned that will slow
> down writes (increased latency) in order to ensure compaction isn't
> heavily affected and remains ahead of write traffic -- thus, further
> preventing stalls. Net effect is somewhat higher latency and lower
> throughput that is more consistent (ie. 95%+ are tighter around
> average latency).
> 
> I hope this answers your question.
> 
> -Joe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120821/f6745294/attachment.html>


More information about the riak-users mailing list