Slow s3cmd ls queries + HAProxy 504 timeouts

Kelly McLaughlin kelly at basho.com
Mon Aug 18 14:27:53 EDT 2014


Alex,

Could you share your Riak CS app.config file with me? I'd like to look 
over what you have for a few settings that could affect the bucket 
listing performance.


Kelly


On 08/18/2014 11:22 AM, Alex Millar wrote:
> Hey Kelly,
>
> Thanks for reaching out! We’re using the following versions for RiakCS 
> & Riak
>
> # Download RiakCS
> # Version: 1.4.5
> # OS: Ubuntu 12.04 (Precise) AMD 64
> curl -O 
> http://s3.amazonaws.com/downloads.basho.com/riak-cs/1.4/1.4.5/ubuntu/precise/riak-cs_1.4.5-1_amd64.deb
>
> # Download Riak
> # Version: 1.4.8
> # OS: Ubuntu 12.04 (Precise) AMD 64
> curl -O 
> http://s3.amazonaws.com/downloads.basho.com/riak/1.4/1.4.8/ubuntu/precise/riak_1.4.8-1_amd64.deb
>
> The intent of being able to have performant ls operations was to that 
> we could connection via Transmit <http://panic.com/transmit/> to view 
> and navigate the contents of the bucket, similar to how you can access 
> the contents of your S3 buckets in their webUI. That being said our 
> keys are akin to a folder structure, for example...
>
> /organizations/OrganizationID-[OrganizationID]/documents/proposals/ProposalID-[ProposalID]/DocumentSlotID-[DocumentSlotID]
>
> S3 must be doing some sort of secondary indexing to allow for fast 
> lookups here, because the bucket in question that has the performance 
> issues only has 2 “folders” under 
> s3://bonfirehub-resources-can-east-doc-conversion yet it takes the 
> longest to s3cmd ls since Riak is clearly traversing all the keys to 
> fulfill this request.
>
> Short story, this is not a requirement for us in order to use RiakCS 
> however, going forward it would be desirable if RiakCS could maintain 
> this form of secondary indices (and potentially have a WebUI) to 
> better match some use cases that exist for clients who are used to 
> using S3.
>
> Bonfire Logo 	*Alex Millar*, CTO
> Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
> Mobile: 519-729-2539 <tel:+15197292539>
> *GoBonfire*.com <http://GoBonfire.com>
>
>
> From: Kelly McLaughlin <kelly at basho.com> <mailto:kelly at basho.com>
> Reply: Kelly McLaughlin <kelly at basho.com>> <mailto:kelly at basho.com>
> Date: August 15, 2014 at 7:03:47 PM
> To: Alex Millar <alex at gobonfire.com>> <mailto:alex at gobonfire.com>, 
> riak-users at lists.basho.com <riak-users at lists.basho.com>> 
> <mailto:riak-users at lists.basho.com>
> Subject: Re: Slow s3cmd ls queries + HAProxy 504 timeouts
>
>> Hello Alex. Would you mind sharing what version of Riak and Riak CS 
>> you are using? Also if you can post the the contents of your Riak CS 
>> app.config file
>> it might help give a better idea of what might be going on.
>>
>> Generally listing the contents of a bucket is more expensive than a 
>> normal download or upload request, but there have been performance 
>> improvements in recent
>> versions of Riak CS and there are settings that can be adjusted 
>> depending on the version you are using. The time required to list the 
>> contents of the entire bucket
>> is definitely related to the number of objects in that bucket so the 
>> time will continue to increase as the number of objects increases, 
>> but we do continue to work to
>> make the process as efficient as possible.
>>
>> Depending on why you need to list the contents of the bucket the 
>> max-keys query parameter available with the bucket listing operation 
>> may be useful. By default this
>> limit is 1000 keys, but s3cmd does not expose this that I'm aware of 
>> and instead buffers all the results until the end of the contents is 
>> reached. But if you need
>> to list the contents for the purpose of some processing step, it may 
>> work better for you to break up this process into smaller chunks 
>> using max-keys.
>>
>> Kelly
>>
>> On 08/15/2014 06:39 AM, Alex Millar wrote:
>>> So the issue we’re having is only with bucket listing.
>>>
>>> alxndrmlr at alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
>>> s3://bonfirehub-resources-can-east-doc-conversion
>>>            DIR 
>>> s3://bonfirehub-resources-can-east-doc-conversion/organizations/
>>>
>>> real 2m0.747s
>>> user 0m0.076s
>>> sys 0m0.030s
>>>
>>> where as…
>>>
>>> alxndrmlr at alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
>>> s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
>>>            DIR 
>>> s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals/
>>>
>>> real 0m10.262s
>>> user 0m0.075s
>>> sys 0m0.028s
>>>
>>> The contents of this bucket contains a lot of very small files 
>>> (basically for each PDF we receive I split it to .JPG foreach page 
>>> and store them here. Based on the my latest counts it looks like we 
>>> have around *170,000* .JPG files in that bucket.
>>>
>>> Now I’ve had a hunch this is just a fundamentally expensive 
>>> operation which exceeds the 5000ms response time threshold set in 
>>> our HAProxy config (which I raised during the video to illustrate 
>>> what’s going on). After reading 
>>> http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why and 
>>> http://www.paperplanes.de/2011/12/13/list-all-of-the-riak-keys.html I’m 
>>> feeling like this is just a fundamental issue with the data 
>>> structure in Riak.
>>>
>>> Based on this I’m thinking that cost of this type of query is only 
>>> going to get worse over time as we add more keys to this bucket 
>>> (unless secondary indexes can be added). Or am I totally out to 
>>> lunch here and there’s some other underlying problem?
>>>
>>> Bonfire Logo 	*Alex Millar*, CTO
>>> Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
>>> Mobile: 519-729-2539 <tel:+15197292539>
>>> *GoBonfire*.com <http://GoBonfire.com>
>>>
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140818/8a36aa93/attachment.html>


More information about the riak-users mailing list