LevelDB compaction and timeouts
ixmatus at gmail.com
Mon Jan 7 20:08:28 EST 2013
I've had a few situations arise where one or two nodes (all it needs is
one node) will begin a heavy compaction cycle (determined by using gstat
+ looking at leveldb LOG files) and ALL queries put through the cluster
(it doesn't matter which node) return a timeout.
I can fix this situation by killing the node(s) in question then marking
them as down; but the minute I bring them up the take everything down
again until they complete.
Is there a process by which I can keep queries being issued to certain
nodes while having them live and dealing with their compaction? It seems
highly inconvenient that I have to pick some time at night to bring them
up so my users experience as few of the timeout errors are possible...
More information about the riak-users