Slow s3cmd ls queries + HAProxy 504 timeouts

Kelly McLaughlin kelly at basho.com
Fri Aug 15 19:02:18 EDT 2014


Hello Alex. Would you mind sharing what version of Riak and Riak CS you 
are using? Also if you can post the the contents of your Riak CS 
app.config file
it might help give a better idea of what might be going on.

Generally listing the contents of a bucket is more expensive than a 
normal download or upload request, but there have been performance 
improvements in recent
versions of Riak CS and there are settings that can be adjusted 
depending on the version you are using. The time required to list the 
contents of the entire bucket
is definitely related to the number of objects in that bucket so the 
time will continue to increase as the number of objects increases, but 
we do continue to work to
make the process as efficient as possible.

Depending on why you need to list the contents of the bucket the 
max-keys query parameter available with the bucket listing operation may 
be useful. By default this
limit is 1000 keys, but s3cmd does not expose this that I'm aware of and 
instead buffers all the results until the end of the contents is 
reached. But if you need
to list the contents for the purpose of some processing step, it may 
work better for you to break up this process into smaller chunks using 
max-keys.

Kelly

On 08/15/2014 06:39 AM, Alex Millar wrote:
> So the issue we’re having is only with bucket listing.
>
> alxndrmlr at alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
> s3://bonfirehub-resources-can-east-doc-conversion
>  DIR s3://bonfirehub-resources-can-east-doc-conversion/organizations/
>
> real 2m0.747s
> user 0m0.076s
> sys 0m0.030s
>
> where as…
>
> alxndrmlr at alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
> s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
>  DIR 
> s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals/
>
> real 0m10.262s
> user 0m0.075s
> sys 0m0.028s
>
> The contents of this bucket contains a lot of very small files 
> (basically for each PDF we receive I split it to .JPG foreach page and 
> store them here. Based on the my latest counts it looks like we have 
> around *170,000* .JPG files in that bucket.
>
> Now I’ve had a hunch this is just a fundamentally expensive operation 
> which exceeds the 5000ms response time threshold set in our HAProxy 
> config (which I raised during the video to illustrate what’s going 
> on). After reading 
> http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why and 
> http://www.paperplanes.de/2011/12/13/list-all-of-the-riak-keys.html I’m feeling 
> like this is just a fundamental issue with the data structure in Riak.
>
> Based on this I’m thinking that cost of this type of query is only 
> going to get worse over time as we add more keys to this bucket 
> (unless secondary indexes can be added). Or am I totally out to lunch 
> here and there’s some other underlying problem?
>
> Bonfire Logo 	*Alex Millar*, CTO
> Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
> Mobile: 519-729-2539 <tel:+15197292539>
> *GoBonfire*.com <http://GoBonfire.com>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140815/c88b78ec/attachment.html>


More information about the riak-users mailing list