Riak 1.4.2 10G Ethernet Performance Problems

Evan Vigil-McClanahan emcclanahan at basho.com
Fri Jun 20 14:09:06 EDT 2014


In my testing with large but smaller binaries (median 40k), I found
that the settings gave a noticeable bump (8000 -> 10000 ops/s) , but
only so far as the disk could keep up (and the disk cache, of course).
Typically, for larger objects, you're going to be disk limited most of
the time.  Remember that riak is basically doing a bunch of random,
mid-sized (to the disk, at least) reads and writes here, so disk
limitations are going to make it really hard to get near to your
disk's theoretical maximums.

On Fri, Jun 20, 2014 at 10:53 AM, Chris Read <chris.read at gmail.com> wrote:
> We still have this problem (we're on riak 1.4.9) and it's very frustrating!
>
> Our average object size right now is ~250k. We're running with:
>
> +zdbbl 2097151
>
>
> I've tried the settings above on a 5 node test cluster, no improvement.
>
> I then bumped both buffers up to 1048576 on all nodes - no improvement.
>
> Finally I tried putting the buffers up to 4194304 - still no improvement.
>
> For the record my kernel is Ubuntu 3.13.0-27, with the following
> network settings:
>
> net.core.netdev_max_backlog = 10000
> net.core.rmem_default = 8388608
> net.core.rmem_max = 104857600
> net.core.somaxconn = 4000
> net.core.wmem_default = 8388608
> net.core.wmem_max = 104857600
> net.ipv4.tcp_congestion_control = cubic
> net.ipv4.tcp_fin_timeout = 15
> net.ipv4.tcp_low_latency = 0
> net.ipv4.tcp_max_syn_backlog = 40000
> net.ipv4.tcp_slow_start_after_idle = 0
> net.ipv4.tcp_tw_reuse = 1
>
> Chris
>
> On Wed, Jun 18, 2014 at 7:32 PM, Evan Vigil-McClanahan
> <emcclanahan at basho.com> wrote:
>> Hi Earl,
>>
>> There are some known internode bottlenecks in riak 1.4.x.  We've
>> addressed some of them in 2.0, but others likely remain.  If you're
>> willing to run some code at the console, running the following at the
>> console (from `riak attach`) should tell you whether or not the 2.0
>> changes are likely to help you.  I am not sure when 2.0 ready versions
>> of CS are slated for, however.
>>
>> -----
>> [inet:setopts(Port, [{sndbuf, 393216}, {recbuf, 786432}])
>>   || {_Node, Port} <- erlang:system_info(dist_ctrl)].
>>
>> or to run this on all nodes (which you'll have to do to see if it helps):
>>
>> FF = fun() ->
>>                   [inet:setopts(Port, [{sndbuf, 393216}, {recbuf, 786432}])
>>                     || {_Node, Port} <- erlang:system_info(dist_ctrl)]
>>         end.
>> rpc:multicall(erlang, apply, [FF, []]).
>>
>> You should not run any of this on production machines without
>> extensive testing first.  Also if you have huge objects, like in a CS
>> cluster, it may help to increase the buffer sizes somewhat.
>>
>> Note that increasing +zdbbl in your vm.args can also help somewhat, if
>> it isn't already prohibitively large.
>>
>> Hope that this helps.  Let us know what you find.
>>
>> Evan
>>
>> On Wed, Jun 18, 2014 at 4:57 PM, Earl Ruby <earl_ruby at xyratex.com> wrote:
>>> Chris Read:
>>>
>>> Back in 2013 you reported a performance problem with Riak 1.4.2 running on a
>>> 10GbE network where Riak would never hit speeds faster than 2.5Gbps on the
>>> network.
>>>
>>> I'm seeing the same thing with Riak 1.4.2 and RiakCS. I've followed all of
>>> the tuning suggestions, my MTU is set to 9000 on the ethernet interfaces, I
>>> have one 10GbE network just for the backend inter-node data and one 10GbE
>>> "public" network where RiakCS listens for connections and which basho_bench
>>> uses to generate the load. I have 1-4 client systems on the public side
>>> running basho_bench and no matter how much traffic I generate with
>>> basho_bench I never see more than 3Gbits/s on the network. (It doesn't seem
>>> to matter if I run 1 or 4 clients, each with 200 concurrent sessions, the
>>> network data rate is about the same.) I'm running jnettop in two different
>>> windows during the tests to watch the aggregate network traffic on the
>>> private inter-node data network and the "public" basho_bench
>>> traffic-generating network.
>>>
>>> I've tested the network with iperf3 and it shows 9.92Gbits/s throughput with
>>> a TCP maximum segment size of 9000.
>>>
>>> I've tested the filesystems on each of the 6 Riak nodes using fio, and I can
>>> write to the filesystems at ~12.8Gbits/s, so the filesystem is not the
>>> bottleneck. Each node has 128GB RAM and is running the bitcask backend. The
>>> servers are mostly idle.
>>>
>>> I tried Sean's solution of increasing these values to:
>>>
>>> {riak_core, [
>>>     {handoff_batch_threshold, 4194304},
>>>     {handoff_concurrency, 10} ]}
>>>
>>> ... as described in
>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-October/013787.html,
>>> but that had no effect.
>>>
>>> With my current hardware I'd expect that the 10GbE network would be the
>>> bottleneck, and I'd expect write speeds to top out at the top end of the
>>> network speed.
>>>
>>> There was no follow-up message on the mailing list to indicate how or if
>>> you'd solved the problem. Did you find a solution?
>>>
>>> (Please direct replies to the mailing list.)
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list