Node Recovery Questions
sean.mcevoy at gmail.com
Wed Aug 8 06:23:56 EDT 2018
A few questions on the procedure here to recover a failed node:
We lost a production riak server when AWS decided to delete a node and we
plan on doing this procedure to replace it with a newly built node. A
practice run in our QA environment has brought up some questions.
- How can I tell when everything has synched up? I thought I could just
monitor the handoffs but these completed within 5 minutes of comitting the
cluster changes, the data directories continued to grow rapidly in size for
at least an hour. I assume that this was data being synched to the new node
but how can I tell when it has completed from the user level? Or is it left
up to AAE to sync the data?
- The size of the bitcask directory on the 4 original nodes is ~10GB, on
the new node the size of this directory climbed to 1GB within an hour but
hasn't moved much in the 4 days since. I know bitcask entries still exist
until the periodic compaction but can it be right that its hanging on to
90% the disk space its using for dead data?
- Not directly related to the recovery procedure, but while one node of a
five-node cluster is down how is the extra load distributed within the
cluster? It will still keep 3 copies of each entry, right? Are the copies
that would have been on the missing node all stored on the next node in the
ring, or distributed all around the cluster?
Thanks in advance,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users