Riak Overload Behavior

Sean O'Connor sean at focuslab.io
Mon May 14 22:27:05 EDT 2012


I've spent a while trying to write a script to replicate this problem but so far I've been unsuccessful.  I'll try to keep an eye out to see if I can spot patterns for when this happens but otherwise I'll probably need to drop this problem for a little while.  

If anybody has ideas on additional things I can try to check to isolate the cause of these 400 errors I'd be happy to give it a try.  Just in case it sparks ideas for possible causes below is a bit more information about our test suite/environment: 
Heavy use of leveldb storage backend and secondary index queries.
Both javascript and erlang map reduce functions are used.
The error has been observed with Riak 1.1.2 running on both OS X 10.7.3 installed via homebrew and on ubuntu 11.04 installed via the package based chef cookbook.
The OS X system has a single riak instance running, the linux based system is a 3 node cluster running on 3 EC2 instances.
The app.config being used can be found at https://gist.github.com/83b59b51b94b38e451b0

Thanks again for the quick response!


-- 
Sean O'Connor
Co-Founder/CTO

FocusLab (http://www.focuslab.io)
(845)669-0883


On Monday, May 14, 2012 at 7:56 PM, Sean O'Connor wrote:

> Interesting.  I am running Riak 1.1.2 and I am making the requests via version 1.4.0 of the python client using the HTTP transport class. I'll see if I can create a simple script that replicates the problem and post it to the list sometime later tonight or tomorrow.
> 
> -- 
> Sean O'Connor
> Co-Founder/CTO
> 
> FocusLab (http://www.focuslab.io)
> (845)669-0883
> 
> 
> On Monday, May 14, 2012 at 7:48 PM, Sean Cribbs wrote:
> 
> > Hi Sean,
> > 
> > Are you running your test suite against 1.1.x? We have seen 400 errors in the past where the client in a keep-alive connection emitted additional CRLFs or other whitespace beyond the ones required for the request. The next time a request was made, the 400 error was already on the wire because the HTTP parser failed to recognize the (non-existent) request in the additional whitespace. This error should be fixed already in 1.1.x. 
> > 
> > On Mon, May 14, 2012 at 5:45 PM, Sean O'Connor <sean at focuslab.io (mailto:sean at focuslab.io)> wrote:
> > > Hello, 
> > > 
> > > I've been seeing some strange behavior from Riak and any help or feedback would be very welcome.
> > > 
> > > In particular, we've been seeing pseudo-radom 400 errors from Riak when we run our test suite.  Our test suite hits riak pretty hard to test various queries and situations in our app and the errors seem to happen a few seconds after the CPUs on the testing machine get saturated. I say psuodo-random as the errors tends to happen in a particularly riak heavy portion of our test suite but the specific call to riak that errors is often different (reads, writes, map/reduce jobs, 2i queries).  The thing that is really baffling about these errors is there doesn't seem to be any information attached other than the 400 message and nothing shows up in any of the riak logs. 
> > > 
> > > If we introduce a 2 second delay between tests in our test suite, the problem goes away but obviously slows down our test suite quite a bit.  Aside from that, I am concerned about this potentially happening in our production cluster. 
> > > 
> > > Is this more or less the expected behavior of Riak when it gets overwhelmed or does it sound like there is something that should be investigated further here?  I suspect that I can create a script to replicate the problem but if this is somewhat expected then I won't waste the time writing it. 
> > > 
> > > Thanks! 
> > > 
> > > -- 
> > > Sean O'Connor
> > > Co-Founder/CTO
> > > 
> > > FocusLab (http://www.focuslab.io)
> > > (845)669-0883 (tel:%28845%29669-0883) 
> > > 
> > > 
> > > _______________________________________________
> > > riak-users mailing list
> > > riak-users at lists.basho.com (mailto:riak-users at lists.basho.com)
> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > 
> > 
> > 
> > 
> > -- 
> > Sean Cribbs <sean at basho.com (mailto:sean at basho.com)>
> > Software Engineer
> > Basho Technologies, Inc.
> > http://basho.com/
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120514/5d64fc5c/attachment.html>


More information about the riak-users mailing list