nodes with 100% HD usage

Mon Apr 13 07:16:15 EDT 2015

Result - Failed writes, reduced AAE availability, system errors, probably other (OS level) processes terminating.

100% disk usage is never good. However, our storage systems are write-append, which will mitigate against data corruption.

If the node becomes completely unavailable, the other nodes will also attempt to rebalance the data, with less nodes this means each node will be responsible for more storage, which could potentially cause a cascading failure.

Moral of the story - monitor, and start sending SMS messages when disk use goes above 80%, a standard devops chore, and applicable to any business critical computer system.


> One theoretical question; what happens when a node (or more) hits a 100% HD usage?
> Riak can easily scale horizontally adding new nodes to the cluster, but what if one of them is full? will the system have troubles? will this node only be used only for reading and new items get saved in the other nodes? will the data rebalance in newly added servers freeing some space in the fully used node?
