Riak 1.4.2 10G Ethernet Performance Problems

Chris Read chris.read at gmail.com
Fri Jun 20 13:53:18 EDT 2014


We still have this problem (we're on riak 1.4.9) and it's very frustrating!

Our average object size right now is ~250k. We're running with:

+zdbbl 2097151


I've tried the settings above on a 5 node test cluster, no improvement.

I then bumped both buffers up to 1048576 on all nodes - no improvement.

Finally I tried putting the buffers up to 4194304 - still no improvement.

For the record my kernel is Ubuntu 3.13.0-27, with the following
network settings:

net.core.netdev_max_backlog = 10000
net.core.rmem_default = 8388608
net.core.rmem_max = 104857600
net.core.somaxconn = 4000
net.core.wmem_default = 8388608
net.core.wmem_max = 104857600
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_max_syn_backlog = 40000
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1

Chris

On Wed, Jun 18, 2014 at 7:32 PM, Evan Vigil-McClanahan
<emcclanahan at basho.com> wrote:
> Hi Earl,
>
> There are some known internode bottlenecks in riak 1.4.x.  We've
> addressed some of them in 2.0, but others likely remain.  If you're
> willing to run some code at the console, running the following at the
> console (from `riak attach`) should tell you whether or not the 2.0
> changes are likely to help you.  I am not sure when 2.0 ready versions
> of CS are slated for, however.
>
> -----
> [inet:setopts(Port, [{sndbuf, 393216}, {recbuf, 786432}])
>   || {_Node, Port} <- erlang:system_info(dist_ctrl)].
>
> or to run this on all nodes (which you'll have to do to see if it helps):
>
> FF = fun() ->
>                   [inet:setopts(Port, [{sndbuf, 393216}, {recbuf, 786432}])
>                     || {_Node, Port} <- erlang:system_info(dist_ctrl)]
>         end.
> rpc:multicall(erlang, apply, [FF, []]).
>
> You should not run any of this on production machines without
> extensive testing first.  Also if you have huge objects, like in a CS
> cluster, it may help to increase the buffer sizes somewhat.
>
> Note that increasing +zdbbl in your vm.args can also help somewhat, if
> it isn't already prohibitively large.
>
> Hope that this helps.  Let us know what you find.
>
> Evan
>
> On Wed, Jun 18, 2014 at 4:57 PM, Earl Ruby <earl_ruby at xyratex.com> wrote:
>> Chris Read:
>>
>> Back in 2013 you reported a performance problem with Riak 1.4.2 running on a
>> 10GbE network where Riak would never hit speeds faster than 2.5Gbps on the
>> network.
>>
>> I'm seeing the same thing with Riak 1.4.2 and RiakCS. I've followed all of
>> the tuning suggestions, my MTU is set to 9000 on the ethernet interfaces, I
>> have one 10GbE network just for the backend inter-node data and one 10GbE
>> "public" network where RiakCS listens for connections and which basho_bench
>> uses to generate the load. I have 1-4 client systems on the public side
>> running basho_bench and no matter how much traffic I generate with
>> basho_bench I never see more than 3Gbits/s on the network. (It doesn't seem
>> to matter if I run 1 or 4 clients, each with 200 concurrent sessions, the
>> network data rate is about the same.) I'm running jnettop in two different
>> windows during the tests to watch the aggregate network traffic on the
>> private inter-node data network and the "public" basho_bench
>> traffic-generating network.
>>
>> I've tested the network with iperf3 and it shows 9.92Gbits/s throughput with
>> a TCP maximum segment size of 9000.
>>
>> I've tested the filesystems on each of the 6 Riak nodes using fio, and I can
>> write to the filesystems at ~12.8Gbits/s, so the filesystem is not the
>> bottleneck. Each node has 128GB RAM and is running the bitcask backend. The
>> servers are mostly idle.
>>
>> I tried Sean's solution of increasing these values to:
>>
>> {riak_core, [
>>     {handoff_batch_threshold, 4194304},
>>     {handoff_concurrency, 10} ]}
>>
>> ... as described in
>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-October/013787.html,
>> but that had no effect.
>>
>> With my current hardware I'd expect that the 10GbE network would be the
>> bottleneck, and I'd expect write speeds to top out at the top end of the
>> network speed.
>>
>> There was no follow-up message on the mailing list to indicate how or if
>> you'd solved the problem. Did you find a solution?
>>
>> (Please direct replies to the mailing list.)
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list