RiakCS poor s3 upload speeds 2MB/s

Kota Uenishi kota at basho.com
Wed Jan 21 21:39:37 EST 2015


Toby,

Glad to hear you gained some.

For disk usage gaps, I'd recommend smaller leeway_seconds, like an
hour or even less, depending on the average object size and read
concurrency. leeway seconds is just for assuring long-running download
not being  interrupted by concurrent overwrite or deletion and GC. And
increasing gc_max_workers in app.config might help the speed of
reclaiming disk usage, eating some disk IOPS for more parallel
deletion. The default concurrency is 5 for 1.5 series.
delete_concurrency also might help - by multiplying the concurrency
whose default is 1.

Another possibility is large number of siblings. If you have frequent
and concurrent upload against same key, it might create fair number of
siblings (and parallel manifests in single sibling), which means
rather large Riak object in leveldb. This might also affect the
performance either when is number is large as logged in CS, or the
object size is small. When objects are small, the overhead of fetching
whole related data (buckets, users, metadata in the manifest) cannot
be ignored. That'd affect byte through put, maybe.

As you may see in github, we're working on CS 2.0 which works with
Riak 2.0, so please stay tuned!



On Wed, Jan 21, 2015 at 10:07 AM, Toby Corkindale <toby at dryft.net> wrote:
> Hi Kota,
> I had a bit of an off-list chat about this a while ago, plus continued
> to investigate locally, and eventually achieved some faster speeds,
> around 15MByte/sec writes.
> Things that were changed:
>  * Adjusted Riak CS GC to be spread out over the cluster much more.
>  * Tweaked up the put buffers and concurrency further
>  * Moved most of the files out of CS and into Amazon S3+Glacier
>  * Switched from nginx to haproxy
>  * simplified firewalling for internal clients
>
> Each one of those changes made a small to modest improvement, but
> overall combined to make a quite noticeable improvement.
>
> I did notice something odd though -- despite moving most of the data
> out of the cluster, the disk-space-in-use by Riak is still very large
> compared to the amount stored. I mean, we moved more than 90% of the
> data out of the cluster, yet the actual disk space used only halved.
> For every gigabyte of file stored in CS, dozens of gigabytes are
> actually on disk!
>
> Either the garbage collection algorithm is very, very lazy, or somehow
> something has gone a bit wrong in the past, which might have explained
> part of the performance problems.
>
> We're going to look at redeploying a new, fresh cluster based on Riak
> 2 in the not too distant future, once Riak CS looks like it's approved
> for use there, and maybe that'll clear all of this up.
>
> Toby
>
> On 21 January 2015 at 11:07, Kota Uenishi <kota at basho.com> wrote:
>> Toby and David,
>>
>> Thank you for working on Riak CS and I apologize for being late responder.
>>
>> I believe the reason of being slow down is different between Toby's
>> and David's cluster.
>>
>> Toby's reason looks like that's just because of the data increasing.
>> How much data per vnode do you have in your cluster, Toby? Do you have
>> deletion in your workload?
>> Riak CS's garbage collection, deleting block keys in Riak and merging
>> Bitcask files make some load more than the exact amount of data
>> visible via CS (even taking the replication factor into account).
>> Also, if you turn on AAE, building AAE trees scans all the data stored
>> in Riak to fix unexpected bit rot or partial replication. I'd like you
>> to check the background load to underlying storage. If the performance
>> decrease is *not* due to such background load, maybe there's the same
>> dragon lurking under the water as David's cluster.
>>
>> One thing I can suggest from David's app.config, SSL is turned on -
>> Riak CS uses Erlang's built-in SSL library for https scheme which is
>> said to have not so good performance. I wonder those benchmarks were
>> done over https or just http.
>>
>> As far as I test Riak CS, local or cluster-wide, I haven't met such
>> bad performance less than 10MB/s in such a fresh state cluster. There
>> should be something wrong, either the setup or the software. Would you
>> mind sending us the result riak-debug and riak-cs-debug commands, if
>> you still can reproduce such situation. Those packs up the environment
>> info as much as it can.
>>
>> Thanks,
>> Kota
>>
>> On Thu, Nov 27, 2014 at 10:36 AM, Toby Corkindale <toby at dryft.net> wrote:
>>> Thanks, that's interesting to hear.
>>> How have you been finding the stability and reliability to be with
>>> leofs, over time?
>>>
>>>
>>> I still wish I could just get our Riak CS cluster performing better;
>>> it just seems so unreasonably slow at the moment, that I suspect
>>> there's *something* holding it back. I can build a test cluster on my
>>> desktop, and even with five virtual riak nodes on the one machine, I
>>> still see 20-40x the performance, so it seems bizarre that dedicated
>>> bare-metal servers would be so slow. (Although obviously there's much
>>> more network latency between real machines, than a virtual cluster on
>>> one desktop; and they have a lot more data in their bitcask databases)
>>>
>>>
>>> However I've tried fiddling with all the Riak and Riak CS options..
>>> ethtool offloads.. mount options.. sysctls.. MTU sizes.. even dropping
>>> single nodes out of the cluster one at a time in case they were
>>> somehow at fault..  seems like the only performance changes I can make
>>> are negative.
>>>
>>> We're still double the speed of the original poster in this thread,
>>> but.. that isn't saying much.
>>>
>>> Toby
>>>
>>>
>>> On 26 November 2014 at 06:44, Heinz Nikolaus Gies <heinz at licenser.net> wrote:
>>>> If you’re evaluating RiakCS vs. Ceph you might want to toss LeoFS[1] in the
>>>> mix and give it a run. Just as RiakCS it is a dynamo inspired system build
>>>> in Erlang and comes with the same advantages and disadvantages. But unlike
>>>> RiakCS it is pretty much exclusive a Object Store so can take a few
>>>> different optimizations for this kind of work that might not be possible in
>>>> a general purpose database as Riak (this is my personal guess not a research
>>>> founded conclusion).  The team is (much) smaller then bash (obviously) but
>>>> they’re a very nice and responsive bunch. I ended up using it as a s3
>>>> backend for Project-FiFo due to it’s performance characteristics. With
>>>> current releases I manage to get a sigle file upload speed of ~1.2GB/s using
>>>> gof3r[2] (this might be a client limitation but I haven’t had time to
>>>> investigate the details).
>>>>
>>>> [1] http://leo-project.net/leofs/
>>>> [2] https://github.com/rlmcpherson/s3gof3r/tree/master/gof3r
>>>> ---
>>>> Cheers,
>>>> Heinz Nikolaus Gies
>>>> heinz at licenser.net
>>>>
>>>>
>>>>
>>>> On Nov 25, 2014, at 6:08, Toby Corkindale <toby at dryft.net> wrote:
>>>>
>>>> Hi,
>>>> I wondered if you managed to significantly improve your Riak CS
>>>> performance, or not?
>>>>
>>>> I just ask as we've been getting not-dissimilar performance out of
>>>> Riak CS too (4-5 mbyte/sec max per client, on bare metal hardware),
>>>> for quite a long time. (I swear it was faster originally, when there
>>>> was a lot less data in the whole system.)
>>>> This is after applying all the tweaks available -- networking stack,
>>>> filesystem mount options, assorted Erlang vm.args, and increased put
>>>> concurrency/buffer options.
>>>>
>>>> We put up with it because it's been just-about sufficient enough for
>>>> our needs and Riak CS has been reliable and easy to administer -- but
>>>> it's becoming more of an issue, and so I'm curious to know if other
>>>> people *do* manage to achieve *good* per-client speeds out of Riak CS
>>>> or if this is just how things always are?
>>>> And we're way off the mark, maybe we can find out why..
>>>>
>>>> Details of our setup:
>>>> 6 node cluster. RIng size of 64.
>>>> Riak 1.4.10
>>>> Riak CS 1.5.2
>>>> (installed from official Basho repos)
>>>>
>>>> Tests conducted using both multi-part and non-multi-part upload mode;
>>>> performance is similar with both. Tested against cluster when very
>>>> lightly loaded.
>>>> For the sake of testing, a 100M file is being used, that contains
>>>> random (hard to compress) data.
>>>>
>>>> Cheers,
>>>> Toby
>>>>
>>>> On 8 November 2014 at 01:41, David Meekin <David.Meekin at autotrader.co.uk>
>>>> wrote:
>>>>
>>>> Hi,
>>>> I’ve setup a test 4 node RiakCS cluster on HP BL460c hardware and I can’t
>>>> seem to get S3 upload speeds above 2MB/s
>>>> I’m connecting direct to RiackCS on one of the nodes so there is no load
>>>> balancing software in place.
>>>> I have also installed s3cmd locally onto one of the nodes and the speeds
>>>> locally are the same.
>>>> These 4 nodes also run a test CEPH cluster with RadosGW and s3 uploads to
>>>> CEPH achieve 125MB/s
>>>> Any help would be appreciated as I’m currently evaluating both CEPH and
>>>> RiakCS.
>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Turning and turning in the widening gyre
>>> The falcon cannot hear the falconer
>>> Things fall apart; the center cannot hold
>>> Mere anarchy is loosed upon the world
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>> --
>> Kota UENISHI / @kuenishi
>> Basho Japan KK
>
>
>
> --
> Turning and turning in the widening gyre
> The falcon cannot hear the falconer
> Things fall apart; the center cannot hold
> Mere anarchy is loosed upon the world



-- 
Kota UENISHI / @kuenishi
Basho Japan KK




More information about the riak-users mailing list