Riak Overload Behavior
sean at focuslab.io
Mon May 14 17:45:32 EDT 2012
I've been seeing some strange behavior from Riak and any help or feedback would be very welcome.
In particular, we've been seeing pseudo-radom 400 errors from Riak when we run our test suite. Our test suite hits riak pretty hard to test various queries and situations in our app and the errors seem to happen a few seconds after the CPUs on the testing machine get saturated. I say psuodo-random as the errors tends to happen in a particularly riak heavy portion of our test suite but the specific call to riak that errors is often different (reads, writes, map/reduce jobs, 2i queries). The thing that is really baffling about these errors is there doesn't seem to be any information attached other than the 400 message and nothing shows up in any of the riak logs.
If we introduce a 2 second delay between tests in our test suite, the problem goes away but obviously slows down our test suite quite a bit. Aside from that, I am concerned about this potentially happening in our production cluster.
Is this more or less the expected behavior of Riak when it gets overwhelmed or does it sound like there is something that should be investigated further here? I suspect that I can create a script to replicate the problem but if this is somewhat expected then I won't waste the time writing it.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users