Riak 0.14 nodes crashing under light load

Anthony Molinaro anthonym at alumni.caltech.edu
Tue Feb 1 19:08:11 EST 2011


  I just set up a 4 node cluster mostly vanilla config with the exception
that I specified 1024 partitions, and I'm using a multi-backend with one
entry for bitcask and the default as bitcask (I plan to deploy a cache
backend at some point).  I have one bucket which stores a pretty small
payload (key is 36 bytes, value is 36 bytes).

Things ran fine under light load (~400 get, ~30 puts according to riak-admin
status, so I think that's per minute).  Suddenly several nodes (3/4) all
shutdown within a few minutes of each other.  They all seem to have
errors like this

=ERROR REPORT==== 1-Feb-2011::23:50:35 ===^M
Failed to open lock file /var/lib/riak/bitcask/1156070631091827503657211635254091060470024765440/bitcask.write.lock: emfile

then a state machine termination stacktrace.

I had run cluster_info for all of them a few minutes before and the only
machine which didn't crash was the one I ran cluster_info on.

Not sure if that was the cause or not.  Any ideas what could cause these

I can send more info if it would help, but wanted to get the conversation
started before I head home.



