Active Anti Entropy with Bitcask Key Expiry

Ryan Zezeski rzezeski at
Thu Apr 18 11:08:29 EDT 2013


AAE should not resurrect keys when bitcask expiry is enabled.  However, a
non-trivial amount of work may be performed if a lot of keys expire all at

You're correct that the layers above bitcask have no notion of expiry.
 When a key expires no notification is sent to Riak.  This means that
hashtrees (which I'll call trees from here on out) will continue storing an
entry for a key after it has expired.  As long as all trees agree that the
key is still there AAE will be none the wiser about expiry.  However, AAE
has its own notion of expiry.  Every tree has en expiration date at which
point is is discarded and rebuilt from scratch based on the data in the
backend.  By default trees expire after a week.  This means there could be
a window where the trees disagree because some were rebuilt and no longer
include the expired key.  At this point AAE will try to repair the data by
invoking a read-repair.  Since bitcask honors expiry on 'get' all N copies
will return not_found and thus read-repair will do nothing.  Then AAE will
send a 'rehash' request to all N replicas [1] [2].  The rehash will notice
the key is no longer and delete it from the tree.

So, keys should not be resurrected, but it could generate additional I/O
proportional to the number of keys expired.  For example:

1. bitcask expiry is set to 1 day
2. millions of keys are written in hour time span thus every hour millions
of keys expire
3. the same key is never overwritten inside a weeks time
4. AAE is using default tree expiry of a week
5. the trees for a given preflist are _not_ all expired at about the same

In this scenario, when a tree expires it may have millions of expired keys
to deal with.  This means millions of Riak 'get' calls plus millions of
'rehash' calls.  Now, since the rehash operation is sent to all replicas
only 1 tree of a preflist needs to expire for all replica trees to be
repaired.  This means the maximum number of times you should take this hit
is Q / N where Q = ring size, N = n_val.

Point #3, #4, #5 really are the key here.  There must be an overlap where
keys are expired and only a subset of a preflist's trees have been rebuilt.
 The more often keys are re-written and the more nodes you have the less
likely it will be to hit this window.




On Tue, Apr 16, 2013 at 11:07 AM, Ben Murphy <benmmurphy at> wrote:

> Does anyone know if these two place nice with each other? As far as I can
> see the higher layers sitting on top of bitcask are not aware that bitcask
> can expire keys. Would the anti-entropy code try to resurrect expired keys?
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list