Exploring Riak, need to confirm throughput

Reid Draper reiddraper at gmail.com
Thu Apr 4 17:45:13 EDT 2013


On Apr 4, 2013, at 4:14 PM, Matthew MacClary <macclary at lifetime.oregonstate.edu> wrote:

> Thanks for the feedback. I made two changes to my test setup and saw better throughput:
> 
> 1) Don't write to the same key over and over. Updating a key appears to be a lot slower than creating a new key
> 
> 2) I used parallel PUTs
> 
> The throughput I was measuring before was about 26MB/s on localhost. With these changes it went to around 200MB/s on a disk that can write at about 480MB/s. That is more the type of performance I need for the data store we have in mind. I am going to proceed with testing on 8 nodes with RAID0 drives.

How are you measuring throughput? HTTP throughput, or disk throughput with something like iostat?

> 
> Here are some details of the testing I did if it will help others. I tried the test with 1MB, 10MB, and 20MB binary data. I didn't notice a big signal with regard to larger objects slowing things down.

The issues with larger objects will likely only present themselves when you have more than one node.

> 
> wget http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm
> 
> sudo rpm -Uvh riak-1.2.1-1.el5.x86_64.rpm
> /usr/sbin/riak start
> mkdir data-dir && cd data-dir
> seq -w 0 100 | parallel dd if=/dev/zero of={}.10meg bs=8k count=1280
> http_proxy=   # don’t contact proxy
> time find . -name \*.10meg | parallel -j8 -n1 wget --post-file {} http://127.0.0.1:8098/riak/test1/{}
> 
> During these tests I saw beam.smp jumping to 350-550 while watching %CPU under top. When I was seeing slower thoughput beam.smp was using much less CPU.
> 
> Kind regards,
> 
> -Matt
> 
> On Wed, Apr 3, 2013 at 7:20 AM, Reid Draper <reiddraper at gmail.com> wrote:
> inline:
> 
> 
> On Apr 2, 2013, at 6:48 PM, Matthew MacClary <macclary at lifetime.oregonstate.edu> wrote:
> 
>> Hi all, I am new to this list. Thanks for taking the time to read my questions! I just want to know if the data throughput I am seeing is expected for the bitcask backend or if it is too low.
>> 
>> I am doing the preliminary feasibility study to decide if we should implement a Riak data store. Our application involves rendering chunks of data that range in size from about 1MB-9MB or so. This rendering work is CPU intensive so it is spread over a bunch of compute nodes which write the output into a data store.
> 
> Riak is not intended to store objects of this size, not at the moment anyway. Riak CS [1], on the other hand, can store files up to several TB. That being said, Riak CS may or may not have other qualities  you desire. It's a known issue [2] that the Riak object size limitations should be better documented.
> 
>> 
>> After rendering, a second process consumes that data chunks from the data store at a rate of about 480MB/s in a streaming configuration so there is > 480MB/s of new data coming in at the same time the data is being read.
> 
> Is this a single-socket, or is there some concurrency here?
> 
>> 
>> My testing so far involves a one node cluster on a dev box. What I wanted to show is that Riak writes were limited by the hard disk throughput. So far I haven't seen writes to localhost come anywhere close to the hard disk throughput:
>> 
>> $ MYFILE=/tmp/output.png
>> $ dd if=/dev/zero of=$MYFILE bs=8k count=256k
>> 262144+0 records in
>> 262144+0 records out
>> 2147483648 bytes (2.1 GB) copied, 4.48906 seconds, 478 MB/s
>> $ rm $MYFILE
>> 
>> So the hard disk throughput is around 478MB/s for this simple write test.
>> 
>> The next test I did was to load a 39MB binary file into my one node cluster. I used a script to do 12 POSTs with curl and 12 POSTSs with wget. 
>> 
>> curl --tcp-nodelay -XPOST http://${IP}:${PORT}/riak/test/file3 \
>>     -H "Content-Type:application/octet-stream" \
>>     --data-binary @${UPLOAD_FILE} \
>>     --write-out "%{speed_upload}\n"
>> 
>> wget --post-file ${UPLOAD_FILE} http://127.0.0.1:8098/riak/test/file1
>> 
>> What I found was that I could get only about 26MB/s with this command line testing. Does this seam about right? Should I see an 18x slow down over the write speed of the disk drive?
> 
> Was this running the 24 (12 * 2) uploads in serial or parallel? With a single-threaded workload, you're unlikely to get Riak to be able to saturate a disk. Furthermore, there are design decisions in Riak at the moment that make it less than optimal for single objects of 39MB. Single-object high throughput (measured in MB) is more in the wheelhouse of Riak CS than Riak on it's own, which is primarily designed for low-latency and high-throughput (measured in ops/sec). One of the ways that Riak CS achieves this on top of Riak is by introducing concurrency between the end-user and Riak.
> 
>> 
>> Thanks for your comments on my application and test approach!
> 
> Hope this helps,
> Reid
> 
> [1] http://docs.basho.com/riakcs/latest/
> [2] https://github.com/basho/basho_docs/issues/256
> 
> 
>> 
>> -Matt
>> 
>> -----------------------------------------------
>> Dev Environment Details:
>> dev box  running RHEL6.2, 12 cores, 48GB, 6Gb/s SAS 15k HD
>> Riak 1.2.1 from http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm
>> n_val=1
>> r=1
>> w=1
>> backend=bitcask
>> 
>> Deploy Environment Details:
>>  Node to node bandwidth > 40Gb/s
>>  similar config for node servers
>>  n_val=3
>>  r=1
>>  w=1
>>  backend=?
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130404/9c24a21d/attachment.html>


More information about the riak-users mailing list