Leveled and Anti-Entropy

Martin Sumner martin.sumner at adaptip.co.uk
Fri Jul 21 10:06:30 EDT 2017


I've added some anti-entropy features to Leveled (the pure-Erlang KV store
designed as a Riak backend).  These features are in-part an experiment in
how to approach both anti-entropy and full-sync multi-data centre
replication in the future.

There's a long write-up, including some history of AAE in Riak:

https://github.com/martinsumner/leveled/blob/master/docs/ANTI_ENTROPY.md

In summary, Riak's current AAE is based on cryptographically strong Merkle
trees, and this experiment is based on removing that security strength, as
it isn't relevant to the context in which is used.  Instead Leveled now has
Merkle Trees that can be merged and also can be built incrementally (i.e.
be built key by key even when the keys are not in segment order).

Using these new trees (coined TicTac trees to fit into Leveled's terrible
naming convention), we can build AAE trees in folds incrementally and hence
at a lower cost, but also merge trees across independent stores.  In the
future trees can be built from folds using Riak coverage queries, across
either indexes or objects in the store - and compared between different
database clusters even where those clusters are partitioned differently
e.g. different ring-sizes.

The expectation is that there will be more flexibility of choice in what we
can decide to compare at run time - not just are the objects consistent,
are the indexes consistent.  Also split from partition constraints there
will be improved flexibility in what we can decide to compare between -
e.g. make it easier to compare with a different database.

Coupled with this there's a demonstration of using temporary indexes in
Leveled, index entries that auto-expire at a TTL, and we've shown how this
can be used with tree-creating folds to compare recent changes between
stores at a lower cost than comparing the whole database state: with the
added advantage that the long-term footprint of the database is not
extended by maintaining a separate copy of all the keys and hashes.

Concurrently to this, we now have some other work ongoing in the space of
replication and anti-entropy:

- @russelldb is continuing to test and improve his open source real-time
replication solution (rabl) which uses RabbitMQ.  He's hoping to be able to
talk further on progress with this by the end of August.
- I'm working on implementing in riak_core a core_node_worker_pool, which
is intended to compliment the core_vnode_worker_pool but allow for coverage
queries where snapshots are taken on a covering set of vnodes, but folds
are then scheduled to run one-at-a-time on each node.  This can then be
used to regulate the impact of anti-entropy folds.

Our current target is to have a release candidate of open-source
replication (both real-time and full-sync) by the end of September.  This
will initially be focused only on replication between two Riak clusters.

Regards

Martin (@masleeds)

P.S. Hopefully next Friday we should also be able to report back on the
improvements and test enhancements that followed up the work on riak_core
claim.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170721/eece76f5/attachment-0002.html>


More information about the riak-users mailing list