SV: Slow inserts
russell.brown at me.com
Tue Apr 12 08:14:27 EDT 2011
Oops, and to the list this time.
On 12 Apr 2011, at 12:03, Jens Rantil wrote:
> Thank you for your response. It is good to have a reference. Thank you for that.
> As of the Python client, I first rewrote the benchmark in Java. It still seems to be using an http client and I see a similar write performance (260-330 writes/second). Later, I read in the Python client documentation that protocol buffers was recommended for production systems. Modifying the Python client to use protocol buffers indeed yielded a significant performance boost now up to ~2900 writes/second. This looks very much what I expected and it does indeed show that http was the overhead.
> All benchmarks mentioned above have consisted of 11 threads writing in total 53000 key-values using round robin over the nodes.
> A follow-up question: I am considering using Java for production and currently my Java benchmark (see above) is very slow. You mention using a Java stream interface? Do you mean this involves making multiple requests through the same http connection?
Both the HTTP client and the protocol buffers client reuse connections. The HTTP client holds a pool of connections and the pb client creates a connection per thread, and reuses that connection (unless it is inactive for over a second) so if you can set up your client to have a number of threads busily pumping data you can get some good throughput.
> I have set maxConnections to 50 for every node in (Java) benchmark without any significant performance boost. Is there anything else I should set?
What version of of the Java client where you using for HTTP? The 0.14 (and previous) only really allowed a maximum of 2 concurrent connections over http. That is fixed and is in the new 0.14.1 release that went out yesterday.
> I stumbled across the java protocol buffers client, and I guess that's the better alternative.
Without a doubt you want to be using the protobuffers client.
Using the protobuffers client you have a couple of options about how to use it for best write performance. Either use
public ByteString store(RiakObject values, RequestMeta meta) from some threads (if you can batch your objects up) or use
public RiakObject store(RiakObject value, IRequestMeta meta) from some threads.
If you use IRequestMete.returnBody(true) the former will be faster as it reads the responses in whilst still writing out the responses.
If you're just pumping data in then don't set returnBody to true (or use public void store(RiakObject) ).
Setting the *same* client id across the threads (whilst conceptually iffy) yields a performance increase too I've noticed, but to do that you will also need the latest release (0.14.1).
On my dual core MBP running the client and a single Riak node, the 1400 writes from the Riak Fast Track google.csv file take ~900 millisecs with the protobuffs client, with client Id set using 3 threads and about ~1300 without client id set.
HTTP about ~2800 from 3 threads with client Id and about ~3200 without.
(Times include reading the file into RiakObjects)
> Från: Kresten Krab Thorup [mailto:krab at trifork.com]
> Skickat: den 11 april 2011 23:34
> Till: Jens Rantil
> Kopia: riak-users at lists.basho.com
> Ämne: Re: Slow inserts
> Just for reference ...in our dev lab ... we quite consistently get ~3000 puts/second with N=3, W=2 on a 3-node macmini cluster running 0.14.1 w/bitcask as the backend. That's small machines with 2.6GHz Dual Core 2, and 8GB ram.
> We do use lots of threads in the client, and a load balancer [or you can just have the threads connect to the different Riak's]. If all requests funnel through one machine it may become loaded, ... and we're using the Java client which is able to stream requests without having to open new connections all the time .. dunno how the python client is in this regard. We also have a proper gigabit managed switch between the macmini's, but I think it is unlikely that networking is the limiting factor ... 3000 x 1-kilobyte-per-second /sec corresponds to just 3 mega-bit-per-second ... the chatter among the clustered machines is roughly N-fold the chatter from the client to the cluster by my observations.
> Have you tried to run some simple CPU/IO monitor on the 3 machines you're using to see if they are CPU or I/O bound? ... If they are not really loaded, you may need to add clients/threads. You should not be satisfied until the server machines are saturated.
> On Apr 11, 2011, at 18:00 , Jens Rantil wrote:
> I have set up a 5-node test environment to give Riak a test run. I wrote a Python script (http://pastebin.com/geQ00Ngb) that put 53780 key-values into my cluster. Using round robin, 11 threads inserted these values in ~70 seconds. This means an average of ~750 key-values/second. Is this the expected speed of inserts? It seems quite slow. The Mozilla benchmark (http://blog.mozilla.com/data/2010/08/16/benchmarking-riak-for-the-mozilla-test-pilot-project/) reached > 1500 ops/second for significantly bigger values than my benchmark.
> Additional information:
> * Four of the nodes were not doing anything else.
> * n_val=3
> * For every insert, w=1 and dw=0.
> * All nodes are using riak_0.14.1-1_amd64.deb and I have not changed any of the defaults settings except Erlang -cookie and -name.
> Jens Rantil
> riak-users mailing list
> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users