Exploring Riak, need to confirm throughput

Reid Draper reiddraper at gmail.com
Wed Apr 3 10:20:09 EDT 2013


On Apr 2, 2013, at 6:48 PM, Matthew MacClary <macclary at lifetime.oregonstate.edu> wrote:

> Hi all, I am new to this list. Thanks for taking the time to read my questions! I just want to know if the data throughput I am seeing is expected for the bitcask backend or if it is too low.
> I am doing the preliminary feasibility study to decide if we should implement a Riak data store. Our application involves rendering chunks of data that range in size from about 1MB-9MB or so. This rendering work is CPU intensive so it is spread over a bunch of compute nodes which write the output into a data store.

Riak is not intended to store objects of this size, not at the moment anyway. Riak CS [1], on the other hand, can store files up to several TB. That being said, Riak CS may or may not have other qualities  you desire. It's a known issue [2] that the Riak object size limitations should be better documented.

> After rendering, a second process consumes that data chunks from the data store at a rate of about 480MB/s in a streaming configuration so there is > 480MB/s of new data coming in at the same time the data is being read.

Is this a single-socket, or is there some concurrency here?

> My testing so far involves a one node cluster on a dev box. What I wanted to show is that Riak writes were limited by the hard disk throughput. So far I haven't seen writes to localhost come anywhere close to the hard disk throughput:
> $ MYFILE=/tmp/output.png
> $ dd if=/dev/zero of=$MYFILE bs=8k count=256k
> 262144+0 records in
> 262144+0 records out
> 2147483648 bytes (2.1 GB) copied, 4.48906 seconds, 478 MB/s
> $ rm $MYFILE
> So the hard disk throughput is around 478MB/s for this simple write test.
> The next test I did was to load a 39MB binary file into my one node cluster. I used a script to do 12 POSTs with curl and 12 POSTSs with wget. 
> curl --tcp-nodelay -XPOST http://${IP}:${PORT}/riak/test/file3 \
>     -H "Content-Type:application/octet-stream" \
>     --data-binary @${UPLOAD_FILE} \
>     --write-out "%{speed_upload}\n"
> wget --post-file ${UPLOAD_FILE}
> What I found was that I could get only about 26MB/s with this command line testing. Does this seam about right? Should I see an 18x slow down over the write speed of the disk drive?

Was this running the 24 (12 * 2) uploads in serial or parallel? With a single-threaded workload, you're unlikely to get Riak to be able to saturate a disk. Furthermore, there are design decisions in Riak at the moment that make it less than optimal for single objects of 39MB. Single-object high throughput (measured in MB) is more in the wheelhouse of Riak CS than Riak on it's own, which is primarily designed for low-latency and high-throughput (measured in ops/sec). One of the ways that Riak CS achieves this on top of Riak is by introducing concurrency between the end-user and Riak.

> Thanks for your comments on my application and test approach!

Hope this helps,

[1] http://docs.basho.com/riakcs/latest/
[2] https://github.com/basho/basho_docs/issues/256

> -Matt
> -----------------------------------------------
> Dev Environment Details:
> dev box  running RHEL6.2, 12 cores, 48GB, 6Gb/s SAS 15k HD
> Riak 1.2.1 from http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm
> n_val=1
> r=1
> w=1
> backend=bitcask
> Deploy Environment Details:
>  Node to node bandwidth > 40Gb/s
>  similar config for node servers
>  n_val=3
>  r=1
>  w=1
>  backend=?
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130403/3b00f80e/attachment.html>

More information about the riak-users mailing list