bug after ~10K get/store requests

Nico Meyer nico.meyer at adition.com
Wed Feb 2 04:41:43 EST 2011


Hi,

The problem was not primarily that both processes ran on the same host,
but that the per process limits where to low for your load.
You would observe the same behaviour with Apache instead of Riak (maybe
the startup scripts of Apache in most distros increases the filehandle
limit, but lets just assume that is not the case :-) ).

Of course if you want to get realistic results from your load test, you
should run the clients on different hosts from the one riak is running
on in any case.

Cheers,
Nico 

Am Dienstag, den 01.02.2011, 17:05 -0800 schrieb Paco NATHAN:
> Thank you, Nico.
> Much appreciated. I'm embarrassed that we opened up a load test
> process on the same host as the data store and didn't realize we were
> spammer ourselves :)
> 
> 
> On Fri, Jan 28, 2011 at 04:44, Nico Meyer <nico.meyer at adition.com> wrote:
> > Have you tried raising the limit for open filehandles (ulimit -n)?
> > Each TCP connection also uses one filehandle, and by default they will
> > linger around for 60 seconds after the connection is closed.
> >
> > The PHP client uses the REST interface afaik, so I assume it will open a
> > new TCP connection for each request. What /proc/sys/fs/file-nr while you
> > are running your test. What does the first column show when the request
> > start to fail? I would guess about 2048 if you run riak and your test
> > script on the same machine or about 1024 if they are on different
> > machines.
> >
> > You could also try tp lower /proc/sys/net/ipv4/tcp_fin_timeout to say
> > 10, which should be more than enough on a LAN. This means closed TCP
> > connections only linger for 10 seconds, so unless you do more than about
> > 10000 requests/s the default filehandle limit should be enough.
> >
> > Of course on a production system you should read up on these settings
> > and set them according to your load and usage patterns.
> >
> > Cheers,
> > Nico
> >
> > Am Donnerstag, den 27.01.2011, 13:57 -0800 schrieb Paco NATHAN:
> >> We're seeing a repeatable error -- running some tests with Riak
> >> 0.14.0-1 on Ubuntu, with PHP 5.3.3-1ubuntu9.3
> >>
> >> We have a load test, using the PHP client, which stores then gets 1MM
> >> objects. This is being run against both Riak and MySQL as part of a
> >> performance evaluation suite.  The MySQL side runs fine, so the test
> >> script seems fine.
> >>
> >> On the Riak side, we generally get an exception from
> >> RiakObject::populate() which throws 'Could not contact Riak Server..'
> >> after having done between 10-20K store/get pairs, each with different
> >> keys.  It doesn't always happen.
> >>
> >> There's nothing else running on this system, and we see it both on our
> >> VMware instances and on an EC2 m1.xlarge
> >>
> >> I get the sense that our Riak client is causing a temporary DoS on our
> >> Riak server...
> >> Has anyone else seen this?
> >>
> >> Maybe adding some retry logic into the client would resolve it?
> >>
> >> Thanks,
> >>
> >> Paco
> >> IMVU
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users at lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> >






More information about the riak-users mailing list