slow 2 node cluster

Jeremiah Peschka jeremiah.peschka at gmail.com
Sun Nov 20 16:21:02 EST 2011


Kyle's recommendations are solid. 

If you're reaching a maximum throughput rate of 300 inserts per second regardless of number of nodes, then there's a fundamental infrastructure problem. 

Eliminate every possible bottleneck one at a time:

You can eliminate disk as a bottleneck by testing with the in-memory backend. 
You can eliminate network as a bottleneck by testing on the host.
You can eliminate your client as a bottleneck by running basho_bench.

As someone who has spent a lot of time making RDBMSes go really fast, any waiting on disk is too much waiting on disk ;)
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Nov 20, 2011, at 12:56 PM, Aphyr wrote:

> On 11/20/2011 12:14 PM, Catalin Constantin wrote:
>> I am 100% sure the transfer rate is 10MBytes / second. This is not the
>> problem.
> 
> In ten years of network administration I have never encountered an ethernet device with a wire rate of 10 MBps. I have, however, encountered frequent confusion over units. Perhaps you understand my suspicion here. :)
> 
>> IOWAIT is also pretty low. iostat shows: iowait 4.69%
> 
> If you read my email, I suggested that even values as low as 2.6% may suggest contention. It depends on your CPU arch and utilization. I would investigate your disks more closely. Are they spinning or solid-state? Median seek time? 95/99 seek times? Disk cache? Does the riak process spend disproportionate time in IO_WAIT relative to USER? Filesystem atime/relatime/noatime? FS block size properly aligned for your disk? Insufficient filesystem or leveldb cache? hdparm options? Is riak on an independent disk or competing with other processes, i.e. syslog, file servers, etc? Does strace show an unusual amount of time spent in certain system calls?
> 
> It might just be leveldb, too. I only have experience with bitcask in production.
> 
>> I have retried the test with one node, a new bucket newly created where
>> i have set: n to 1.
>> bucket.set_n_val(1)
>> 
>> Results are the same. Less than 300 inserts / second.
> 
> This is good; it rules out replication.
> 
>> Any idea why riak is so slow on inserting data ?
> 
> Disk, disk disk disk, disk disk? Disk!
> 
> You could also look at the client. Are you writing to a local node or a remote one? Is your client's threading model getting in the way? Can you actually produce data fast enough to insert it? Is your client fighting for the same resources as the riak process? Presuming you've ruled these out, it's almost certainly network or disk.
> 
> --Kyle
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list