Slow s3cmd ls queries + HAProxy 504 timeouts
alex at gobonfire.com
Fri Aug 15 08:39:17 EDT 2014
So the issue we’re having is only with bucket listing.
alxndrmlr at alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls s3://bonfirehub-resources-can-east-doc-conversion
alxndrmlr at alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
The contents of this bucket contains a lot of very small files (basically for each PDF we receive I split it to .JPG foreach page and store them here. Based on the my latest counts it looks like we have around 170,000 .JPG files in that bucket.
Now I’ve had a hunch this is just a fundamentally expensive operation which exceeds the 5000ms response time threshold set in our HAProxy config (which I raised during the video to illustrate what’s going on). After reading http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why and http://www.paperplanes.de/2011/12/13/list-all-of-the-riak-keys.html I’m feeling like this is just a fundamental issue with the data structure in Riak.
Based on this I’m thinking that cost of this type of query is only going to get worse over time as we add more keys to this bucket (unless secondary indexes can be added). Or am I totally out to lunch here and there’s some other underlying problem?
Alex Millar, CTO
Office: 1-800-354-8010 ext. 704
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users