I'm trying to load data from a Hadoop cluster using distcp.  Distcp supports the S3 API, but I'm running into issues.
Has anyone tested/had success with this process?  Any help is appreciated!  Details below...


Here's my setup:
Hadoop cluster with a small text file in hdfs. file configured to use a proxy host.
Proxy host running Varnish basically to serve as a load balancer at this point.  All caching is currently disabled.
Riak-CS/Riak running on a 6 server cluster.

Here's the scenario:
I'm running this command...

Ø  hadoop distcp -libjars ./jets3t-config.jar hdfs://hadoop.node.address/user/dan/test.txt s3n://riak-user-key:riak-secret@testing/

I see many requests and responses in the varnishlog so I know communication is succeeding.  The distcp process throws an exception and I see empty files and directories left on my Riak system.

The exception looks like this:

13/07/08 15:52:42 INFO tools.DistCp: sourcePathsCount=1
13/07/08 15:52:42 INFO tools.DistCp: filesToCopyCount=1
13/07/08 15:52:42 INFO tools.DistCp: bytesToCopyCount=93.0
13/07/08 15:52:42 INFO mapred.JobClient: Running job: job_201307031542_0023
13/07/08 15:52:43 INFO mapred.JobClient:  map 0% reduce 0%
13/07/08 15:52:50 INFO mapred.JobClient: Job complete: job_201307031542_0023
13/07/08 15:52:50 INFO mapred.JobClient: Counters: 6
13/07/08 15:52:50 INFO mapred.JobClient:   Job Counters
13/07/08 15:52:50 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6980
13/07/08 15:52:50 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/08 15:52:50 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/08 15:52:50 INFO mapred.JobClient:     Launched map tasks=1
13/07/08 15:52:50 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/07/08 15:52:50 INFO mapred.JobClient:     Failed map tasks=1
13/07/08 15:52:50 INFO mapred.JobClient: Job Failed: NA
With failures, global counters are inaccurate; consider running with -i
Copy failed: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(

And Riak is left like this:

Ø  s3cmd ls s3://testing
                       DIR   s3://testing/_distcp_logs_7bhq3z/
                       DIR   s3://testing/_distcp_logs_j9re3f/
2013-07-09 03:54         0   s3://testing/_distcp_logs_7bhq3z_$folder$
2013-07-09 00:03         0   s3://testing/_distcp_logs_j9re3f_$folder$
2013-07-08 19:52         0   s3://testing/test.txt

