Slow s3cmd ls queries + HAProxy 504 timeouts

Kelly McLaughlin kelly at
Mon Aug 18 14:27:53 EDT 2014


Could you share your Riak CS app.config file with me? I'd like to look 
over what you have for a few settings that could affect the bucket 
listing performance.


On 08/18/2014 11:22 AM, Alex Millar wrote:
> Hey Kelly,
> Thanks for reaching out! We’re using the following versions for RiakCS 
> & Riak
> # Download RiakCS
> # Version: 1.4.5
> # OS: Ubuntu 12.04 (Precise) AMD 64
> curl -O 
> # Download Riak
> # Version: 1.4.8
> # OS: Ubuntu 12.04 (Precise) AMD 64
> curl -O 
> The intent of being able to have performant ls operations was to that 
> we could connection via Transmit <> to view 
> and navigate the contents of the bucket, similar to how you can access 
> the contents of your S3 buckets in their webUI. That being said our 
> keys are akin to a folder structure, for example...
> /organizations/OrganizationID-[OrganizationID]/documents/proposals/ProposalID-[ProposalID]/DocumentSlotID-[DocumentSlotID]
> S3 must be doing some sort of secondary indexing to allow for fast 
> lookups here, because the bucket in question that has the performance 
> issues only has 2 “folders” under 
> s3://bonfirehub-resources-can-east-doc-conversion yet it takes the 
> longest to s3cmd ls since Riak is clearly traversing all the keys to 
> fulfill this request.
> Short story, this is not a requirement for us in order to use RiakCS 
> however, going forward it would be desirable if RiakCS could maintain 
> this form of secondary indices (and potentially have a WebUI) to 
> better match some use cases that exist for clients who are used to 
> using S3.
> Bonfire Logo 	*Alex Millar*, CTO
> Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
> Mobile: 519-729-2539 <tel:+15197292539>
> *GoBonfire*.com <>
> From: Kelly McLaughlin <kelly at> <mailto:kelly at>
> Reply: Kelly McLaughlin <kelly at>> <mailto:kelly at>
> Date: August 15, 2014 at 7:03:47 PM
> To: Alex Millar <alex at>> <mailto:alex at>, 
> riak-users at <riak-users at>> 
> <mailto:riak-users at>
> Subject: Re: Slow s3cmd ls queries + HAProxy 504 timeouts
>> Hello Alex. Would you mind sharing what version of Riak and Riak CS 
>> you are using? Also if you can post the the contents of your Riak CS 
>> app.config file
>> it might help give a better idea of what might be going on.
>> Generally listing the contents of a bucket is more expensive than a 
>> normal download or upload request, but there have been performance 
>> improvements in recent
>> versions of Riak CS and there are settings that can be adjusted 
>> depending on the version you are using. The time required to list the 
>> contents of the entire bucket
>> is definitely related to the number of objects in that bucket so the 
>> time will continue to increase as the number of objects increases, 
>> but we do continue to work to
>> make the process as efficient as possible.
>> Depending on why you need to list the contents of the bucket the 
>> max-keys query parameter available with the bucket listing operation 
>> may be useful. By default this
>> limit is 1000 keys, but s3cmd does not expose this that I'm aware of 
>> and instead buffers all the results until the end of the contents is 
>> reached. But if you need
>> to list the contents for the purpose of some processing step, it may 
>> work better for you to break up this process into smaller chunks 
>> using max-keys.
>> Kelly
>> On 08/15/2014 06:39 AM, Alex Millar wrote:
>>> So the issue we’re having is only with bucket listing.
>>> alxndrmlr at alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
>>> s3://bonfirehub-resources-can-east-doc-conversion
>>>            DIR 
>>> s3://bonfirehub-resources-can-east-doc-conversion/organizations/
>>> real 2m0.747s
>>> user 0m0.076s
>>> sys 0m0.030s
>>> where as…
>>> alxndrmlr at alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
>>> s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
>>>            DIR 
>>> s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals/
>>> real 0m10.262s
>>> user 0m0.075s
>>> sys 0m0.028s
>>> The contents of this bucket contains a lot of very small files 
>>> (basically for each PDF we receive I split it to .JPG foreach page 
>>> and store them here. Based on the my latest counts it looks like we 
>>> have around *170,000* .JPG files in that bucket.
>>> Now I’ve had a hunch this is just a fundamentally expensive 
>>> operation which exceeds the 5000ms response time threshold set in 
>>> our HAProxy config (which I raised during the video to illustrate 
>>> what’s going on). After reading 
>>> and 
>>> I’m 
>>> feeling like this is just a fundamental issue with the data 
>>> structure in Riak.
>>> Based on this I’m thinking that cost of this type of query is only 
>>> going to get worse over time as we add more keys to this bucket 
>>> (unless secondary indexes can be added). Or am I totally out to 
>>> lunch here and there’s some other underlying problem?
>>> Bonfire Logo 	*Alex Millar*, CTO
>>> Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
>>> Mobile: 519-729-2539 <tel:+15197292539>
>>> *GoBonfire*.com <>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list