Node Recovery Questions

Martin Sumner martin.sumner at
Wed Aug 8 12:37:14 EDT 2018


Some partial answers to your questions.

I don't believe force-replace itself will sync anything up - it just
reassigns ownership (hence handoff happens very quickly).

Read repair would synchronise a portion of the data.  So if 10% of you data
is read regularly, this might explain some of what you see.

AAE should also repair your data.  But if nothing has happened for 4 days,
then that doesn't seem to be the case.  It would be worth checking the
aae-status page ( to
confirm things are happening.

I don't know if there are any minimum levels of data before bitcask will
perform compaction.  There's nothing obvious in the code that wouldn't be
triggered way before 90%.  I don't know if it will merge on the active file
(the one currently being written to), but that is 2GB max size (configured
through bitcask.max_file_size).

When you say the size of the bitcask directory - is this the size shared
across all vnodes on the node?  I guess if each vnode has a single file
<2GB, and there are multiple vnodes - something unexpected might happen
here?  If bitcask does indeed not merge the file active for writing.

In terms of distribution around the cluster, if you have an n_val of 3 you
should normally expect to see a relatively even distribution of the data on
failure (certainly not it all going to one).  Worst case scenario is that 3
nodes get all the load from that one failed node.

When a vnode is inaccessible, 3 (assuming n=3) fallback vnodes are selected
to handle the load for that 1 vnode (as that vnode would normally be in 3
preflists, and commonly a different node will be asked to start a vnode for
each preflist).

I will try and dig later into bitcask merge/compaction code, to see if I
spot anything else.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list