Slow write performance for Riak CS

Toby Corkindale toby at dryft.net
Thu Jul 3 21:08:12 EDT 2014


On 4 July 2014 10:20, Matthew MacClary <macclary at gmail.com> wrote:
> Hi all, a Riak CS user named Toby started this discussion about write
> performance. I am seeing the exact same behavior in terms of idle CPUs,
> network, and disks, but low throughput. Toby do you happen to have any
> follow up about settings to improve the raw Riak throughput and/or Riak CS
> throughput?
>
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-May/012177.html

Hi Matt,
I'm still running the settings I mentioned a year ago; I also setup
RiakCS to talk to a per-machine haproxy that proxies out to the Riak
CS port on all machines, rather than just talking to 127.0.0.01

Performance isn't great per client for putting data into the store,
but it's fast enough for our purposes, and does parallelize OK. (We're
a read-heavy workload and also writes are naturally split over a bunch
of clients writing smaller files, rather than fewer with large files)

Toby

> On 29/05/13 06:12, Reid Draper wrote:
>> Hey Toby,
>>
>> Another option is to explore some of the Riak CS PUT configuration
>> parameters, like the internal write concurrency and the buffer size. These
>> can either be changed by editing the app.config, or changing them at
>> run-time in an Erlang shell (via riak-cs attach). The two parameters are
>> `put_concurrency` and `put_buffer_factor`.
>>
>> They both default to 1. The `put_concurrency` controls the number of
>> threads inside of Riak CS that are used to write blocks to Riak. The
>> `put_buffer_factor` controls the number of blocks that will be buffered
>> in-memory in Riak CS before it starts to slow down reading from the HTTP
>> client. I suggesting trying to raise these values to get higher
>> single-client throughput. If you wish to edit the app.config, add lines like
>> in the `riak_cs` section:
>>
>> {put_concurrency, 8},
>> {put_buffer_factor, 16},
>
>
> Ah, thanks -- that's interesting. Bumping up those up a bit did improve
> things a bit -- I went for a conservative concurrency of 4,
> buffer_factor of 8, and that took speeds on 500M files from 8-9mb/s to
> 12-13mb/sec.
>
> Might play with other values when I next get time, but for now that's a
> good cheap win.
>
>> In increasing these values, you might also find it useful to run a load
>> balancer between the Riak CS and Riak nodes, instead of having Riak CS just
>> communicate with the local Riak node. We intend to have this behavior built
>> into Riak CS in the future, but for now a load balancer will suffice.
>
> Ah, ta. Can do.
>
>> Furthermore, please make sure you've looked through the Linux Tuning page
>> for Riak [1]. And as for +zdbbl, you can try even higher, like 128MB: +zdbbl
>> 131072.
>
> Thanks.
> I'd been through there already, and applied things like filesystem and
> scheduler options.
> I haven't touched the networking sysctls since they came with a warning
> that they should only be messed with if networking *was* the bottleneck.
>
> Given what I've mentioned so far, do you think it's likely?
> I guess it can't hurt to adjust them and see..
>
>> [1]
>> http://docs.basho.com/riak/1.3.1/cookbooks/Linux-Performance-Tuning/#Linux-Tuning
>
>
> Thanks for your advice,
> Toby
>
>
>>
>>
>> Reid
>>
>> On May 27, 2013, at 11:50 PM, Toby Corkindale <toby.corkindale at
>> strategicdata.com.au> wrote:
>>
>>> On 28/05/13 13:11, Jared Morrow wrote:
>>>> Toby,
>>>>
>>>> Can you trying putting data with a second client simultaneously?  When
>>>> people have slow benchmarking, lots of times just using multiple
>>>> worker/clients helps.  Also, what client library are you using?
>>>
>>> Running up three S3 clients (on separate machines) simultaneously saw
>>> them return 8, 8, 6 MB/sec. Interesting to note that the performance hasn't
>>> dropped threefold, but still, it'd be really nice if an individual transfer
>>> would run faster, given the performance of the underlying hardware.
>>>
>>> The nodes hardly get utilised during a write operation. I've uploading a
>>> total of 1500M from the three nodes, and yet CPUs are 90% idle, and there's
>>> no real disk activity going on, apart from a couple of times when several
>>> hundred get flushed out over the course of a second.
>>>
>>> I feel like something isn't quite right. What is the system *waiting*
>>> for? There's plenty of CPU and IO to go around.
>>>
>>>
>>> In the case of the current benchmarking, I'm using either s3cmd or curl.
>>>
>>>
>>>
>>>> Also I meant to mention in my first reply, but Boundary
>>>> http://boundary.com/ worked wonders for us being able to see how much
>>>> data was really moving around.  They have a free trial as far as I know.
>>>>   It might be worth it to see if there are any obvious bottlenecks.
>>>
>>> Thanks, I'll have a look and see if the effort of setting it all up looks
>>> worthwhile.
>>>
>>> Cheers,
>>> Toby
>>>
>>>
>>>> On Mon, May 27, 2013 at 8:46 PM, Toby Corkindale
>>>> <toby.corkindale at strategicdata.com.au
>>>> <mailto:toby.corkindale at strategicdata.com.au>> wrote:
>>>>
>>>>     On 28/05/13 01:41, Jared Morrow wrote:
>>>>
>>>>         Toby,
>>>>
>>>>         If you write with multiple clients does it still stick to 9mb/s
>>>> or
>>>>         does it increase?  What is the network link between your client
>>>> and
>>>>         the Riak CS cluster?  On our internal CS cluster we were seeing
>>>>         around 2gb/s read+write at the network level so I know CS can
>>>> take
>>>>         the speeds, so my gut thinks you single client might have a slow
>>>>         link.  That is just a guess.
>>>>
>>>>
>>>>     The network links are all Ethernet, and appear to be functioning OK.
>>>>     iperf reports:
>>>>     bandwidth from client to loadbalancer: 2.20 gbit/sec
>>>>     bandwidth from loadbalancer to a riak node: 941 mbit/sec
>>>>     bandwidth from one riak node to another node: 942 mbit/s
>>>>
>>>>     I've tested going direct from a client to a riak node rather than
>>>>     via the loadbalancer, but it doesn't seem to make any difference.
>>>>
>>>>     Having tested a bit further now, I'd guess that the problem lies
>>>>     with Riak rather than Riak CS.
>>>>
>>>>     I've noticed that if I try to push files directly into Riak, they go
>>>>     fairly slowly too - around 10-20mbyte/sec.
>>>>
>>>>     I've tried 3, 10 and 50 MB files, against bitcask, leveldb and even
>>>>     memory backends, and in all cases I get fairly consistent transfer
>>>>     rates in that range. (just using curl for testing here)
>>>>
>>>>     I've tried reducing n_val to 1, there was a small but not
>>>>     significant improvement.
>>>>
>>>>     I'm a bit stumped.. However I do note that the log files seem to
>>>>     have a lot of "monitor busy_dist_port" messages in them.. I'm
>>>>     wondering if that might be related somehow?
>>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Turning and turning in the widening gyre
The falcon cannot hear the falconer
Things fall apart; the center cannot hold
Mere anarchy is loosed upon the world




More information about the riak-users mailing list