LevelDB

David Yu david.yu.ftw at gmail.com
Tue Aug 21 22:29:54 EDT 2012


On Wed, Aug 22, 2012 at 5:33 AM, Alexander Sicular <siculars at gmail.com>wrote:

> I was in the Riak 1.2 webinar earlier today and asked a leveldb question
> about insertion order and durability vs. bitcask's WOL architecture. Joe
> was not able to get to my question then but took the time to write me a
> detailed answer. Great engineers at Basho taking time to answer questions
> is a great thing. Thanks Joe!
>
> -Alexander Sicular
>
> @siculars
>
> Begin forwarded message:
>
> *From: *Joseph Blomstedt <joe at basho.com>
> *Subject: **LevelDB*
> *Date: *August 21, 2012 3:45:45 PM EDT
> *To: *siculars at gmail.com
>
> Alexander,
>
> I noticed your LevelDB question in the webinar as Reem was closing
> things out, so I figured I'd follow up via email.
>
> As you know, Bitcask maintains a strict set of write-logs and an
> in-memory hash table that maps keys to (file, offset). Pretty
> straightforward. Compaction is a separate thing that happens based on
> independent triggers.
>
> LevelDB is rather different. LevelDB does maintain a WAL, but it's
> short-lived and only for crash recovery. LevelDB writes to the WAL,
> but also keeps the object in an in-memory write buffer (configurable
> size, increased in Riak 1.2 by 10x from Riak 1.1). After the buffer
> becomes full, LevelDB writes the data to disk as a Level-0 SST (data
> in sorted order + sorted index at the end of the file).
>
> There can be multiple Level-0 SSTs. To read a key, LevelDB looks at
> the index in each SST starting from newest file to oldest. For
> performance, there's an LRU cache of indexes so you're not always
> hitting disk. LevelDB now also includes bloom filters (used in Riak
> 1.2) to make it easier to skip non-interesting SSTs.
>
> To make things more efficient, LevelDB does compaction/merging in a
> background thread. A set of Level-0 files will be selected and merged
> together into a larger Level-1 file. The format is the same, but the
> file is now larger and includes the data from multiple Level-0 files.
> The original Level-0 files are then removed. Likewise, Level-1 files
> are merged into Level-2 files, and Level-2 into Level-3, etc. Each
> Level having larger files with a greater chunk of adjacent, sorted
> data.
>
> To read, you check newest to oldest on Level 0, then Level 1, then Level
> 2, etc.
>
> While compaction is a background thing, LevelDB limits the number of
> Level-0 files you can have. If you hit the limit, LevelDB will block
> writes until files have been merged into Level-1. With a single
> compaction thread, it was easy to max out LevelDB in Riak 1.1, and
> these stalls were fairly frequent and hurt 95% and up latencies, as
> well as greatly hurt throughput. Our change to use multiple compaction
> threads has greatly improved the how quickly compaction occurs, and
> writes rarely (if ever) end up stalling. To further improve things,
> there's the adaptive write throttling that I mentioned that will slow
> down writes (increased latency) in order to ensure compaction isn't
> heavily affected and remains ahead of write traffic -- thus, further
> preventing stalls. Net effect is somewhat higher latency and lower
> throughput that is more consistent (ie. 95%+ are tighter around
> average latency).
>
> I hope this answers your question.
>
> -Joe
>
>
> Thanks for sharing!

>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
When the cat is away, the mouse is alone.
- David Yu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120822/bf317a1b/attachment.html>


More information about the riak-users mailing list