Leveled - Another Erlang Key-Value store

Martin Sumner martin.sumner at adaptip.co.uk
Tue Feb 28 12:28:34 EST 2017

For original testing in isolation I used this:


But I've focused on testing since within Riak using the standard
basho_bench/examples/riakc_pb.config test

Testing of secondary indexing is one of the next big things for me,
definitely going to do this in March.  From a development perspective doing
clones/snapshots was a lot easier than it presumably is in C++.  I'm unsure
about how performance will compare, as 2i terms do end up clumped together
on disk in leveldb and I know MVM has done a lot of optimisation work in
that area.

Some things that may help leveled are: there's no overlapping files in
level 1; the levels don't increase in depth as rapidly.  However, there are
some issues (https://github.com/martinsumner/leveled/issues/34) that I may
still need to contend with.

I think there may be some interesting possibilities with Map functions
based on HEAD not GET requests.  So perhaps as a developer I could add a
bitmap index to my object metadata, and roll across those bitmaps
efficiently through a Map function now that I wouldn't need to pull the
whole object off disk to achieve that. So the concept of division between
Metadata and Value may open up efficient query ideas beyond 2i

On 28 February 2017 at 17:14, DeadZen <deadzen at deadzen.com> wrote:

> Yup I get it, I like the concept and the fun naming of inkers, pencillers
> and clerks as well. Is there a basho bench configuration? This reminds me a
> bit if fractal trees but with a focus on NoSQL operational semantics, how
> else do you see this adding improvements? It seems index requests could be
> cheaper with this backend configuration.
> On Tue, Feb 28, 2017 at 12:07 PM Martin Sumner <
> martin.sumner at adaptip.co.uk> wrote:
>> Have a look in https://github.com/martinsumner/leveled/blob/
>> master/docs/FUTURE.md and the "Riak Features Implemented"
>> I'm trying to avoid the need for Riak to Get n times for every request
>> (both a KV GET and a KV PUT), when really it only needs the body once, and
>> the n-1 times it doesn't need the body if the HEAD response confirms that
>> the vector clock is at the expected state.  As values get larger there's a
>> lot of unnecessary disk activity in those GETs.
>> To understand all the Journal/Inker stuff have a look at
>> https://github.com/martinsumner/leveled/blob/master/docs/DESIGN.md
>> The Inker is an actor that controls the Journal - the Journal is a
>> transaction log where all the values are stored permanently, made up of a
>> series of CDB files where everything is ordered by sequence number.  The
>> Keys and Metadata are stored in a merge tree called the Ledger - and the
>> Ledger is controlled by an actor called the Penciller.  So the HEAD request
>> should just need to check in the Ledger via the Penciller.  A GET request
>> must follow that same path, and then once it has the metadata, it has the
>> sequence number and so can use that to fetch the value from the Journal via
>> the Inker.
>> Does this make sense?
>> My apologies, as the naming may not be helpful.  Perhaps one of the
>> drawbacks of working in isolation on this.
>> Thanks
>> Martin
>> On 28 February 2017 at 16:51, DeadZen <deadzen at deadzen.com> wrote:
>> Cheers indeed!
>> You added HEAD requests so a full GET wouldn't always be required?
>> Did I read that right? *dives into code*
>> %% GET requests first follow the path of a HEAD request, and if an object
>> is
>> %% found, then fetch the value from the Journal via the Inker.
>> ... WHAT?
>> Very nice work, will be more than happy to provide feedback and patches
>> on this.
>> On Tue, Feb 28, 2017 at 9:04 AM, Martin Sumner
>> <martin.sumner at adaptip.co.uk> wrote:
>> >
>> > Over the past few months I've been working on an alternative pure-Erlang
>> > Key/Value store to act as a backend to Riak KV.  This is now open
>> source and
>> > available at
>> >
>> > https://github.com/martinsumner/leveled
>> >
>> > The store is a work-in-progress prototype, originally started to better
>> > understand the impact of different trade-offs in LSM-Tree design.  The
>> aim
>> > is to:
>> >
>> > - provide a fully-featured Riak backend (e.g. secondary index, object
>> expiry
>> > support etc)
>> > - provide stable throughput with larger object sizes (> 4KB)
>> > - provide a simpler and more flexible path to making Riak KV changes
>> > end-to-end
>> >
>> > The prime change in the store when compared to HanoiDB or eleveldb is
>> that
>> > storage is split with only Keys & Metadata being placed in the merge
>> tree,
>> > and the full object living to the side in a series of CDB-based
>> journals.
>> > The intention of this is to:
>> >
>> > - reduce write amplification and ease page-cache pollution issues on
>> > scanning events.
>> > - support faster HEAD requests than GET requests, and in parallel an
>> > alternative Riak KV branch has been produced to move from an n-GET
>> model to
>> > a n-HEAD 1-GET model of fetching data for both KV GET and KV PUT
>> operations
>> >
>> > The impact of this has been to improve throughput for larger object
>> sizes
>> > where disk I/O and not CPU is the current limit on throughput.  The
>> > advantage increases the greater the object size, and the tighter the
>> > constraint on disk.
>> >
>> > Please visit the github page, I've tried to write up as much about the
>> > project as I can.  There's the results of various volume tests,
>> information
>> > on the research which prompted the design, an overview of the design
>> itself
>> > and some hints as to what I expect to try next with leveled.
>> >
>> > Any feedback, please mail me, raise an issue on github, or ping me
>> @masleeds
>> >
>> > Cheers
>> >
>> > Martin
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users at lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170228/82d0f91a/attachment-0002.html>

More information about the riak-users mailing list