SV: Slow inserts

Jens Rantil jens.rantil at telavox.se
Tue Apr 12 07:03:35 EDT 2011


Kresten,

Thank you for your response. It is good to have a reference. Thank you for that.

As of the Python client, I first rewrote the benchmark in Java. It still seems to be using an http client and I see a similar write performance (260-330 writes/second). Later, I read in the Python client documentation that protocol buffers was recommended for production systems. Modifying the Python client to use protocol buffers indeed yielded a significant performance boost now up to ~2900 writes/second. This looks very much what I expected and it does indeed show that http was the overhead.

All benchmarks mentioned above have consisted of 11 threads writing in total 53000 key-values using round robin over the nodes.

A follow-up question: I am considering using Java for production and currently my Java benchmark (see above) is very slow. You mention using a Java stream interface? Do you mean this involves making multiple requests through the same http connection? I have set maxConnections to 50 for every node in (Java) benchmark without any significant performance boost. Is there anything else I should set? I stumbled across the java protocol buffers client, and I guess that's the better alternative.

Thanks,
Jens

Från: Kresten Krab Thorup [mailto:krab at trifork.com]
Skickat: den 11 april 2011 23:34
Till: Jens Rantil
Kopia: riak-users at lists.basho.com
Ämne: Re: Slow inserts

Jens,

Just for reference ...in our dev lab ... we quite consistently get ~3000 puts/second with N=3, W=2 on a 3-node macmini cluster running 0.14.1 w/bitcask as the backend.  That's small machines with 2.6GHz Dual Core 2, and 8GB ram.

We do use lots of threads in the client, and a load balancer [or you can just have the threads connect to the different Riak's].  If all requests funnel through one machine it may become loaded, ... and we're using the Java client which is able to stream requests  without having to open new connections all the time .. dunno how the python client is in this regard.  We also have a proper gigabit managed switch between the macmini's, but I think it is unlikely that networking is the limiting factor ... 3000 x 1-kilobyte-per-second /sec corresponds to just 3 mega-bit-per-second ... the chatter among the clustered machines is roughly N-fold the chatter from the client to the cluster by my observations.

Have you tried to run some simple CPU/IO monitor on the 3 machines you're using to see if they are CPU or I/O bound?  ... If they are not really loaded, you may need to add clients/threads.  You should not be satisfied until the server machines are saturated.

Kresten


On Apr 11, 2011, at 18:00 , Jens Rantil wrote:


Hi,

I have set up a 5-node test environment to give Riak a test run. I wrote a Python script (http://pastebin.com/geQ00Ngb) that put 53780 key-values into my cluster. Using round robin, 11 threads inserted these values in ~70 seconds. This means an average of ~750 key-values/second. Is this the expected speed of inserts? It seems quite slow. The Mozilla benchmark (http://blog.mozilla.com/data/2010/08/16/benchmarking-riak-for-the-mozilla-test-pilot-project/) reached > 1500 ops/second for significantly bigger values than my benchmark.

Additional information:
* Four of the nodes were not doing anything else.
* n_val=3
* For every insert, w=1 and dw=0.
* All nodes are using riak_0.14.1-1_amd64.deb and I have not changed any of the defaults settings except Erlang -cookie and -name.

Thanks,
Jens Rantil
<ATT00001..txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110412/16090959/attachment.html>


More information about the riak-users mailing list