Slow performance using linkwalk, help wanted
ksmith at basho.com
Tue Nov 9 08:58:00 EST 2010
On Nov 9, 2010, at 5:01 AM, Karsten Thygesen wrote:
> OK, we will use a larger ringsize next time and will consider a data reload.
> Regarding the metrics: the servers are dedicated to Riak use and it not used for anything else. They are new HP servers with 8 cores each and 4x146GB 10K RPM SAS disks in a contatenated mirror setup. We use Solaris with ZFS as filesystem and I have turned off atime update in the data partition.
> The pool is built as such:
> pool: pool01
> state: ONLINE
> scrub: scrub completed after 0h0m with 0 errors on Tue Oct 26 21:25:05 2010
> NAME STATE READ WRITE CKSUM
> pool01 ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> c0t0d0s7 ONLINE 0 0 0
> c0t1d0s7 ONLINE 0 0 0
> mirror-1 ONLINE 0 0 0
> c0t2d0 ONLINE 0 0 0
> c0t3d0 ONLINE 0 0 0
> errors: No known data errors
> so it is as fast as possible.
> However - we use the ZFS default blocksize, which is 128Kb - is that optimal with bitcask as backend? It is rather large, but what is optimal with bitcask?
I don't have much experience tuning Solaris or ZFS for Riak. This is a question best asked of Ryan and I will make sure he sees this.
> The cluster is 4 servers with gigabit connection located in the same datacenter on the same switch. The loadbalancer is a Zeus ZTM, which does quote a few http optimizations including extended reuse of http connections and we usually see far better response times using the loadbalancer than using a node directly.
Hmmm. Can you share what the performance times are like for direct cluster access?
> When we run the test, each riak node is only about 100% cpu loaded (which on solaris means, that it only uses one of the 8 cores). We have seen spikes in the 160% area, but everything below 800% is not cpu bound. So all-in-all, the cpuload is between 5 and 10%.
Can you send me the code you're using for the performance test? I'd like to run the exact code on my test hardware and see if that reveals anything.
Also, low CPU usage might indicate you are IO bound. Do you know if Riak processes are spending much time waiting for IO to complete?
More information about the riak-users