Unable to use hadoop distcp with Riak

psterk jazzfan159 at gmail.com
Fri Apr 22 18:13:52 EDT 2016


Hi all,

I am trying to copy a file out of Riak using the s3 protocol to HDFS.  I
have the following file:

I created the following file: /etc/hadoop/conf/jets3t.properties

s3service.s3-endpoint=myhost
s3service.s3-endpoint-http-port=8080
s3service.disable-dns-buckets=true
s3service.s3-endpoint-virtual-path=/

s3service.max-thread-count=10
threaded-service.max-thread-count=10
s3service.https-only=false
httpclient.proxy-autodetect=false
httpclient.proxy-host=myhost
httpclient.proxy-port=8080
httpclient.retry-max=11


hadoop distcp  s3://<access key>:<secret key>@test/test
hdfs://localhost/tmp/test

I get this stack trace:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
Request Error. -- ResponseCode: 404, ResponseStatus: Object Not Found
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source)
	at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
	at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
	at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
	at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
	at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.jets3t.service.S3ServiceException: Request Error. --
ResponseCode: 404, ResponseStatus: Object Not Found
	at org.jets3t.service.S3Service.getObject(S3Service.java:1379)
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:163)
	... 20 more
Caused by: org.jets3t.service.impl.rest.HttpException
	at
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:519)
	at
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
	at
org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestGet(RestStorageService.java:981)
	at
org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2150)
	at
org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2087)
	at org.jets3t.service.StorageService.getObject(StorageService.java:1140)
	at org.jets3t.service.S3Service.getObject(S3Service.java:2583)
	at org.jets3t.service.S3Service.getObject(S3Service.java:84)
	at org.jets3t.service.StorageService.getObject(StorageService.java:525)
	at org.jets3t.service.S3Service.getObject(S3Service.java:1377)

However, with a local .s3cfg file that points to a Riak cluster, I can do
this:

[hdfs at dsg01 ~]$ s3cmd ls s3://test
                       DIR   s3://test/home/
                       DIR   s3://test/setup/
                       DIR   s3://test/test/
                       DIR   s3://test/tmp/

So, s3://test/test does exist and is in Riak, not AWS.


Now, if I comment out s3service.s3-endpoint-virtual-path and run:

hadoop distcp  s3://<access key>:<secret key>@test/test
hdfs://localhost/tmp/test

I see:

java.io.IOException: /test doesn't exist
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170)
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source)
	at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
	at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
	at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
	at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
	at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)

Using @test/test/ produces the same exception as above.

Using: hadoop distcp  s3://<access key>:<secret key>@test
hdfs://localhost/tmp/test

java.io.IOException: /user/hdfs doesn't exist
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170)
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source)
	at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
	at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
	at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
	at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
	at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)

I am the user 'hdfs'.

If I comment out these properties

#s3service.s3-endpoint=myhost
#s3service.s3-endpoint-http-port=8080
#s3service.disable-dns-buckets=true
#s3service.s3-endpoint-virtual-path=/

and run: hadoop distcp  s3://<access key>:<secret key>@test/test
hdfs://localhost/tmp/test

I get fresh, new exception:

16/04/22 21:53:34 ERROR tools.DistCp: Exception encountered
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error
Message: <?xml version="1.0"
encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access
Denied</Message><Resource>/%2Ftest</Resource><RequestId></RequestId></Error>
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source)
	at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
	at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
	at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
	at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
	at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.jets3t.service.S3ServiceException: S3 Error Message. --
ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml
version="1.0"
encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access
Denied</Message><Resource>/%2Ftest</Resource><RequestId></RequestId></Error>
	at org.jets3t.service.S3Service.getObject(S3Service.java:1379)
	at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:163)
	... 20 more

It's odd to see "/%2Ftest"  which is a URL encoding for '/'.  Why is that
there?

Note: 'myhost' is just a placeholder for the actual hostname which does
resolve.

What am I missing?  



--
View this message in context: http://riak-users.197444.n3.nabble.com/Unable-to-use-hadoop-distcp-with-Riak-tp4034185.html
Sent from the Riak Users mailing list archive at Nabble.com.




More information about the riak-users mailing list