Exploring Riak, need to confirm throughput

Matthew MacClary macclary at lifetime.oregonstate.edu
Thu Apr 4 18:22:44 EDT 2013


PBC is certainly something I have on my list of things to explore.
Conceptually I am not sure if the speed gains from this protocol will be
apparent with large binary payloads. I thought that main speed gains were
from 1) more compact binary representation and 2) lower interpretation
overhead. In my situation I already have a largish binary payload that does
not need to be parsed. I could be wrong and may find that out as I explore
this further.

-Matt


On Thu, Apr 4, 2013 at 1:45 PM, Shuhao <shuhao at shuhaowu.com> wrote:

> Just as a side note, you might want to retry the test with PBC. While I
> have only did testings with < 10kb documents, my tests indicates that PBC
> is twice as fast as HTTP in almost all cases.
>
> Shuhao
>
>
> On 13-04-04 04:14 PM, Matthew MacClary wrote:
>
>> Thanks for the feedback. I made two changes to my test setup and saw
>> better
>> throughput:
>>
>> 1) Don't write to the same key over and over. Updating a key appears to be
>> a lot slower than creating a new key
>>
>> 2) I used parallel PUTs
>>
>> The throughput I was measuring before was about 26MB/s on localhost. With
>> these changes it went to around 200MB/s on a disk that can write at about
>> 480MB/s. That is more the type of performance I need for the data store we
>> have in mind. I am going to proceed with testing on 8 nodes with RAID0
>> drives.
>>
>> Here are some details of the testing I did if it will help others. I tried
>> the test with 1MB, 10MB, and 20MB binary data. I didn't notice a big
>> signal
>> with regard to larger objects slowing things down.
>>
>> wget
>> http://downloads.basho.com.s3-**website-us-east-1.amazonaws.**
>> com/riak/1.2/1.2.1/rhel/5/**riak-1.2.1-1.el5.x86_64.rpm<http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm>
>>
>> sudo rpm -Uvh riak-1.2.1-1.el5.x86_64.rpm
>> /usr/sbin/riak start
>> mkdir data-dir && cd data-dir
>> seq -w 0 100 | parallel dd if=/dev/zero of={}.10meg bs=8k count=1280
>> http_proxy=   # don’t contact proxy
>> time find . -name \*.10meg | parallel -j8 -n1 wget --post-file {}
>> http://127.0.0.1:8098/riak/**test1/{}<http://127.0.0.1:8098/riak/test1/%7B%7D>
>>
>> During these tests I saw beam.smp jumping to 350-550 while watching %CPU
>> under top. When I was seeing slower thoughput beam.smp was using much less
>> CPU.
>>
>> Kind regards,
>>
>> -Matt
>>
>> On Wed, Apr 3, 2013 at 7:20 AM, Reid Draper <reiddraper at gmail.com> wrote:
>>
>>  inline:
>>>
>>>
>>> On Apr 2, 2013, at 6:48 PM, Matthew MacClary <
>>> macclary at lifetime.oregonstate.**edu <macclary at lifetime.oregonstate.edu>>
>>> wrote:
>>>
>>> Hi all, I am new to this list. Thanks for taking the time to read my
>>> questions! I just want to know if the data throughput I am seeing is
>>> expected for the bitcask backend or if it is too low.
>>>
>>> I am doing the preliminary feasibility study to decide if we should
>>> implement a Riak data store. Our application involves rendering chunks of
>>> data that range in size from about 1MB-9MB or so. This rendering work is
>>> CPU intensive so it is spread over a bunch of compute nodes which write
>>> the
>>> output into a data store.
>>>
>>>
>>> Riak is not intended to store objects of this size, not at the moment
>>> anyway. Riak CS [1], on the other hand, can store files up to several TB.
>>> That being said, Riak CS may or may not have other qualities  you desire.
>>> It's a known issue [2] that the Riak object size limitations should be
>>> better documented.
>>>
>>>
>>> After rendering, a second process consumes that data chunks from the data
>>> store at a rate of about 480MB/s in a streaming configuration so there
>>> is >
>>> 480MB/s of new data coming in at the same time the data is being read.
>>>
>>>
>>> Is this a single-socket, or is there some concurrency here?
>>>
>>>
>>> My testing so far involves a one node cluster on a dev box. What I wanted
>>> to show is that Riak writes were limited by the hard disk throughput. So
>>> far I haven't seen writes to localhost come anywhere close to the hard
>>> disk
>>> throughput:
>>>
>>> $ MYFILE=/tmp/output.png
>>> $ dd if=/dev/zero of=$MYFILE bs=8k count=256k
>>> 262144+0 records in
>>> 262144+0 records out
>>> 2147483648 bytes (2.1 GB) copied, 4.48906 seconds, 478 MB/s
>>> $ rm $MYFILE
>>>
>>> So the hard disk throughput is around 478MB/s for this simple write test.
>>>
>>> The next test I did was to load a 39MB binary file into my one node
>>> cluster. I used a script to do 12 POSTs with curl and 12 POSTSs with
>>> wget.
>>>
>>> curl --tcp-nodelay -XPOST http://${IP}:${PORT}/riak/**test/file3 \
>>>      -H "Content-Type:application/**octet-stream" \
>>>      --data-binary @${UPLOAD_FILE} \
>>>      --write-out "%{speed_upload}\n"
>>>
>>> wget --post-file ${UPLOAD_FILE} http://127.0.0.1:8098/riak/**test/file1<http://127.0.0.1:8098/riak/test/file1>
>>>
>>> What I found was that I could get only about 26MB/s with this command
>>> line
>>> testing. Does this seam about right? Should I see an 18x slow down over
>>> the
>>> write speed of the disk drive?
>>>
>>>
>>> Was this running the 24 (12 * 2) uploads in serial or parallel? With a
>>> single-threaded workload, you're unlikely to get Riak to be able to
>>> saturate a disk. Furthermore, there are design decisions in Riak at the
>>> moment that make it less than optimal for single objects of 39MB.
>>> Single-object high throughput (measured in MB) is more in the wheelhouse
>>> of
>>> Riak CS than Riak on it's own, which is primarily designed for
>>> low-latency
>>> and high-throughput (measured in ops/sec). One of the ways that Riak CS
>>> achieves this on top of Riak is by introducing concurrency between the
>>> end-user and Riak.
>>>
>>>
>>> Thanks for your comments on my application and test approach!
>>>
>>>
>>> Hope this helps,
>>> Reid
>>>
>>> [1] http://docs.basho.com/riakcs/**latest/<http://docs.basho.com/riakcs/latest/>
>>> [2] https://github.com/basho/**basho_docs/issues/256<https://github.com/basho/basho_docs/issues/256>
>>>
>>>
>>>
>>> -Matt
>>>
>>> ------------------------------**-----------------
>>> Dev Environment Details:
>>> dev box  running RHEL6.2, 12 cores, 48GB, 6Gb/s SAS 15k HD
>>> Riak 1.2.1 from
>>> http://downloads.basho.com.s3-**website-us-east-1.amazonaws.**
>>> com/riak/1.2/1.2.1/rhel/5/**riak-1.2.1-1.el5.x86_64.rpm<http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm>
>>> n_val=1
>>> r=1
>>> w=1
>>> backend=bitcask
>>>
>>> Deploy Environment Details:
>>>   Node to node bandwidth > 40Gb/s
>>>   similar config for node servers
>>>   n_val=3
>>>   r=1
>>>   w=1
>>>   backend=?
>>> ______________________________**_________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>>>
>>>
>>>
>>>
>>
>>
>> ______________________________**_________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>>
>>
> ______________________________**_________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130404/cb6d20f2/attachment.html>


More information about the riak-users mailing list