RiakCS poor s3 upload speeds 2MB/s

Toby Corkindale toby at dryft.net
Wed Jan 28 21:53:53 EST 2015


I turned the triggers and thresholds down, to:

             {max_file_size,            805306368}, %% 768 MB
             {dead_bytes_merge_trigger, 134217728}, %% dead bytes > 128 MB
             {dead_bytes_threshold,      33554432}  %% dead bytes > 32 MB

And restarted nodes; however after 24 hours, disk utilisation remains the same.
ie. For about 6GB of files in CS, 600GB is on disk.
(Previously, when we had >100 GB in CS, we had terabytes on disk)

I do wonder if we hit some kind of issue with Riak CS earlier in the
cluster's life, and have somehow ended up with a lot of "dead" bytes
in there.
This is just data stored by Riak CS, so it should take responsibility
for siblings and their resolution, yes?

If I write something to download files and then re-upload them again,
to the same path, would that cause Riak CS to fix up any issues around
siblings or duplicately-stored data?

Cheers,
Toby

On 22 January 2015 at 03:20, Luke Bakken <lbakken at basho.com> wrote:
> You should be able to help with disk usage by "turning down" the
> trigger and threshold values described here:
>
> http://docs.basho.com/riak/latest/ops/advanced/backends/bitcask/
>
> Your cluster will merge more data which should help with disk usage.
> If your typical use is to create and delete objects frequently, this
> will help.
>
> --
> Luke Bakken
> Engineer
> lbakken at basho.com
>
> On Wed, Jan 21, 2015 at 4:40 AM, Toby Corkindale <toby at dryft.net> wrote:
>> On 21 January 2015 at 15:22, Luke Bakken <lbakken at basho.com> wrote:
>>> Hi Toby -
>>>
>>> Are you using the stock bitcask configuration for merging?
>>
>> Hi Luke,
>> Yes, pretty much.
>>
>>> On Tue, Jan 20, 2015 at 5:07 PM, Toby Corkindale <toby at dryft.net> wrote:
>>>> Hi Kota,
>>>> I had a bit of an off-list chat about this a while ago, plus continued
>>>> to investigate locally, and eventually achieved some faster speeds,
>>>> around 15MByte/sec writes.
>>>> Things that were changed:
>>>>  * Adjusted Riak CS GC to be spread out over the cluster much more.
>>>>  * Tweaked up the put buffers and concurrency further
>>>>  * Moved most of the files out of CS and into Amazon S3+Glacier
>>>>  * Switched from nginx to haproxy
>>>>  * simplified firewalling for internal clients
>>>>
>>>> Each one of those changes made a small to modest improvement, but
>>>> overall combined to make a quite noticeable improvement.
>>>>
>>>> I did notice something odd though -- despite moving most of the data
>>>> out of the cluster, the disk-space-in-use by Riak is still very large
>>>> compared to the amount stored. I mean, we moved more than 90% of the
>>>> data out of the cluster, yet the actual disk space used only halved.
>>>> For every gigabyte of file stored in CS, dozens of gigabytes are
>>>> actually on disk!
>>>>
>>>> Either the garbage collection algorithm is very, very lazy, or somehow
>>>> something has gone a bit wrong in the past, which might have explained
>>>> part of the performance problems.
>>>>
>>>> We're going to look at redeploying a new, fresh cluster based on Riak
>>>> 2 in the not too distant future, once Riak CS looks like it's approved
>>>> for use there, and maybe that'll clear all of this up.
>>>>
>>>> Toby



-- 
Turning and turning in the widening gyre
The falcon cannot hear the falconer
Things fall apart; the center cannot hold
Mere anarchy is loosed upon the world




More information about the riak-users mailing list