Riak Overload Behavior

Sean O'Connor sean at focuslab.io
Mon May 14 19:56:44 EDT 2012


Interesting.  I am running Riak 1.1.2 and I am making the requests via version 1.4.0 of the python client using the HTTP transport class. I'll see if I can create a simple script that replicates the problem and post it to the list sometime later tonight or tomorrow.

-- 
Sean O'Connor
Co-Founder/CTO

FocusLab (http://www.focuslab.io)
(845)669-0883


On Monday, May 14, 2012 at 7:48 PM, Sean Cribbs wrote:

> Hi Sean,
> 
> Are you running your test suite against 1.1.x? We have seen 400 errors in the past where the client in a keep-alive connection emitted additional CRLFs or other whitespace beyond the ones required for the request. The next time a request was made, the 400 error was already on the wire because the HTTP parser failed to recognize the (non-existent) request in the additional whitespace. This error should be fixed already in 1.1.x. 
> 
> On Mon, May 14, 2012 at 5:45 PM, Sean O'Connor <sean at focuslab.io (mailto:sean at focuslab.io)> wrote:
> > Hello, 
> > 
> > I've been seeing some strange behavior from Riak and any help or feedback would be very welcome.
> > 
> > In particular, we've been seeing pseudo-radom 400 errors from Riak when we run our test suite.  Our test suite hits riak pretty hard to test various queries and situations in our app and the errors seem to happen a few seconds after the CPUs on the testing machine get saturated. I say psuodo-random as the errors tends to happen in a particularly riak heavy portion of our test suite but the specific call to riak that errors is often different (reads, writes, map/reduce jobs, 2i queries).  The thing that is really baffling about these errors is there doesn't seem to be any information attached other than the 400 message and nothing shows up in any of the riak logs. 
> > 
> > If we introduce a 2 second delay between tests in our test suite, the problem goes away but obviously slows down our test suite quite a bit.  Aside from that, I am concerned about this potentially happening in our production cluster. 
> > 
> > Is this more or less the expected behavior of Riak when it gets overwhelmed or does it sound like there is something that should be investigated further here?  I suspect that I can create a script to replicate the problem but if this is somewhat expected then I won't waste the time writing it. 
> > 
> > Thanks! 
> > 
> > -- 
> > Sean O'Connor
> > Co-Founder/CTO
> > 
> > FocusLab (http://www.focuslab.io)
> > (845)669-0883 (tel:%28845%29669-0883) 
> > 
> > 
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com (mailto:riak-users at lists.basho.com)
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> 
> 
> 
> -- 
> Sean Cribbs <sean at basho.com (mailto:sean at basho.com)>
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120514/48f4d372/attachment.html>


More information about the riak-users mailing list