Exploring Riak, need to confirm throughput

Shuhao shuhao at shuhaowu.com
Thu Apr 4 16:45:57 EDT 2013


Just as a side note, you might want to retry the test with PBC. While I 
have only did testings with < 10kb documents, my tests indicates that 
PBC is twice as fast as HTTP in almost all cases.

Shuhao

On 13-04-04 04:14 PM, Matthew MacClary wrote:
> Thanks for the feedback. I made two changes to my test setup and saw better
> throughput:
>
> 1) Don't write to the same key over and over. Updating a key appears to be
> a lot slower than creating a new key
>
> 2) I used parallel PUTs
>
> The throughput I was measuring before was about 26MB/s on localhost. With
> these changes it went to around 200MB/s on a disk that can write at about
> 480MB/s. That is more the type of performance I need for the data store we
> have in mind. I am going to proceed with testing on 8 nodes with RAID0
> drives.
>
> Here are some details of the testing I did if it will help others. I tried
> the test with 1MB, 10MB, and 20MB binary data. I didn't notice a big signal
> with regard to larger objects slowing things down.
>
> wget
> http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm
>
> sudo rpm -Uvh riak-1.2.1-1.el5.x86_64.rpm
> /usr/sbin/riak start
> mkdir data-dir && cd data-dir
> seq -w 0 100 | parallel dd if=/dev/zero of={}.10meg bs=8k count=1280
> http_proxy=   # don’t contact proxy
> time find . -name \*.10meg | parallel -j8 -n1 wget --post-file {}
> http://127.0.0.1:8098/riak/test1/{}
>
> During these tests I saw beam.smp jumping to 350-550 while watching %CPU
> under top. When I was seeing slower thoughput beam.smp was using much less
> CPU.
>
> Kind regards,
>
> -Matt
>
> On Wed, Apr 3, 2013 at 7:20 AM, Reid Draper <reiddraper at gmail.com> wrote:
>
>> inline:
>>
>>
>> On Apr 2, 2013, at 6:48 PM, Matthew MacClary <
>> macclary at lifetime.oregonstate.edu> wrote:
>>
>> Hi all, I am new to this list. Thanks for taking the time to read my
>> questions! I just want to know if the data throughput I am seeing is
>> expected for the bitcask backend or if it is too low.
>>
>> I am doing the preliminary feasibility study to decide if we should
>> implement a Riak data store. Our application involves rendering chunks of
>> data that range in size from about 1MB-9MB or so. This rendering work is
>> CPU intensive so it is spread over a bunch of compute nodes which write the
>> output into a data store.
>>
>>
>> Riak is not intended to store objects of this size, not at the moment
>> anyway. Riak CS [1], on the other hand, can store files up to several TB.
>> That being said, Riak CS may or may not have other qualities  you desire.
>> It's a known issue [2] that the Riak object size limitations should be
>> better documented.
>>
>>
>> After rendering, a second process consumes that data chunks from the data
>> store at a rate of about 480MB/s in a streaming configuration so there is >
>> 480MB/s of new data coming in at the same time the data is being read.
>>
>>
>> Is this a single-socket, or is there some concurrency here?
>>
>>
>> My testing so far involves a one node cluster on a dev box. What I wanted
>> to show is that Riak writes were limited by the hard disk throughput. So
>> far I haven't seen writes to localhost come anywhere close to the hard disk
>> throughput:
>>
>> $ MYFILE=/tmp/output.png
>> $ dd if=/dev/zero of=$MYFILE bs=8k count=256k
>> 262144+0 records in
>> 262144+0 records out
>> 2147483648 bytes (2.1 GB) copied, 4.48906 seconds, 478 MB/s
>> $ rm $MYFILE
>>
>> So the hard disk throughput is around 478MB/s for this simple write test.
>>
>> The next test I did was to load a 39MB binary file into my one node
>> cluster. I used a script to do 12 POSTs with curl and 12 POSTSs with wget.
>>
>> curl --tcp-nodelay -XPOST http://${IP}:${PORT}/riak/test/file3 \
>>      -H "Content-Type:application/octet-stream" \
>>      --data-binary @${UPLOAD_FILE} \
>>      --write-out "%{speed_upload}\n"
>>
>> wget --post-file ${UPLOAD_FILE} http://127.0.0.1:8098/riak/test/file1
>>
>> What I found was that I could get only about 26MB/s with this command line
>> testing. Does this seam about right? Should I see an 18x slow down over the
>> write speed of the disk drive?
>>
>>
>> Was this running the 24 (12 * 2) uploads in serial or parallel? With a
>> single-threaded workload, you're unlikely to get Riak to be able to
>> saturate a disk. Furthermore, there are design decisions in Riak at the
>> moment that make it less than optimal for single objects of 39MB.
>> Single-object high throughput (measured in MB) is more in the wheelhouse of
>> Riak CS than Riak on it's own, which is primarily designed for low-latency
>> and high-throughput (measured in ops/sec). One of the ways that Riak CS
>> achieves this on top of Riak is by introducing concurrency between the
>> end-user and Riak.
>>
>>
>> Thanks for your comments on my application and test approach!
>>
>>
>> Hope this helps,
>> Reid
>>
>> [1] http://docs.basho.com/riakcs/latest/
>> [2] https://github.com/basho/basho_docs/issues/256
>>
>>
>>
>> -Matt
>>
>> -----------------------------------------------
>> Dev Environment Details:
>> dev box  running RHEL6.2, 12 cores, 48GB, 6Gb/s SAS 15k HD
>> Riak 1.2.1 from
>> http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm
>> n_val=1
>> r=1
>> w=1
>> backend=bitcask
>>
>> Deploy Environment Details:
>>   Node to node bandwidth > 40Gb/s
>>   similar config for node servers
>>   n_val=3
>>   r=1
>>   w=1
>>   backend=?
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list