Using Hadoop distcp to load data into Riak-CS

Kelly McLaughlin kelly at basho.com
Wed Jul 10 11:28:25 EDT 2013


Hi Dan. I do not know much about distcp, but if it is the case that it uses
a PUT (copy) operation to transfer data then distcp will not currently work
with RiakCS. Support for that operation is on our roadmap, but it is not
done yet unfortunately.

Kelly


On Wed, Jul 10, 2013 at 6:20 AM, Sajner, Daniel G <dsajner at cas.org> wrote:

>  Hi.****
>
> ** **
>
> Sorry about the “fake sender” in the subject of the original message.  Our
> mail security system is funny like that…****
>
> ** **
>
> Anyhow, we discovered that distcp puts a temp file name in place and then
> tries to do a PUT (copy) that copy the file to the permanent name.  From
> the documentation that doesn’t appear to be supported by Riak.
> http://docs.basho.com/riakcs/latest/references/apis/storage/RiakCS-PUT-Object-Copy/
> ****
>
> ** **
>
> I still would like to hear if anyone else has had success with distcp.
> Maybe there is another version out there that works differently.****
>
> ** **
>
> Thanks,****
>
> Dan****
>
>   ****
>
> ** **
>
> *From:* riak-users [mailto:riak-users-bounces at lists.basho.com] *On Behalf
> Of *Sajner, Daniel G
> *Sent:* Tuesday, July 09, 2013 7:56 AM
> *To:* 'riak-users at lists.basho.com'
> *Subject:* [PMX:FAKE_SENDER] Using Hadoop distcp to load data into Riak-CS
> ****
>
> ** **
>
> Hi.****
>
> ** **
>
> I’m trying to load data from a Hadoop cluster using distcp.  Distcp
> supports the S3 API, but I’m running into issues.****
>
> Has anyone tested/had success with this process?  Any help is
> appreciated!  Details below…****
>
> ** **
>
> Thanks,****
>
> Dan****
>
> ** **
>
> ** **
>
> Here’s my setup:****
>
> Hadoop cluster with a small text file in hdfs.****
>
> Jets3t.properties file configured to use a proxy host.****
>
> Proxy host running Varnish basically to serve as a load balancer at this
> point.  All caching is currently disabled.****
>
> Riak-CS/Riak running on a 6 server cluster.****
>
> ** **
>
> Here’s the scenario:****
>
> I’m running this command…****
>
> ** **
>
> **Ø  **hadoop distcp -libjars ./jets3t-config.jar
> hdfs://hadoop.node.address/user/dan/test.txt
> s3n://riak-user-key:riak-secret@testing/****
>
> ** **
>
> I see many requests and responses in the varnishlog so I know
> communication is succeeding.  The distcp process throws an exception and I
> see empty files and directories left on my Riak system.****
>
> ** **
>
> The exception looks like this:****
>
> ** **
>
> 13/07/08 15:52:42 INFO tools.DistCp: sourcePathsCount=1****
>
> 13/07/08 15:52:42 INFO tools.DistCp: filesToCopyCount=1****
>
> 13/07/08 15:52:42 INFO tools.DistCp: bytesToCopyCount=93.0****
>
> 13/07/08 15:52:42 INFO mapred.JobClient: Running job: job_201307031542_0023
> ****
>
> 13/07/08 15:52:43 INFO mapred.JobClient:  map 0% reduce 0%****
>
> 13/07/08 15:52:50 INFO mapred.JobClient: Job complete:
> job_201307031542_0023****
>
> 13/07/08 15:52:50 INFO mapred.JobClient: Counters: 6****
>
> 13/07/08 15:52:50 INFO mapred.JobClient:   Job Counters****
>
> 13/07/08 15:52:50 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6980****
>
> 13/07/08 15:52:50 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0****
>
> 13/07/08 15:52:50 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0****
>
> 13/07/08 15:52:50 INFO mapred.JobClient:     Launched map tasks=1****
>
> 13/07/08 15:52:50 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0****
>
> 13/07/08 15:52:50 INFO mapred.JobClient:     Failed map tasks=1****
>
> 13/07/08 15:52:50 INFO mapred.JobClient: Job Failed: NA****
>
> With failures, global counters are inaccurate; consider running with -i***
> *
>
> Copy failed: java.io.IOException: Job failed!****
>
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1246)*
> ***
>
>         at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)****
>
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)****
>
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)****
>
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)****
>
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)****
>
> ** **
>
> ** **
>
> And Riak is left like this:****
>
> **Ø  **s3cmd ls s3://testing          ****
>
>                        DIR   s3://testing/_distcp_logs_7bhq3z/****
>
>                        DIR   s3://testing/_distcp_logs_j9re3f/****
>
> 2013-07-09 03:54         0   s3://testing/_distcp_logs_7bhq3z_$folder$****
>
> 2013-07-09 00:03         0   s3://testing/_distcp_logs_j9re3f_$folder$****
>
> 2013-07-08 19:52         0   s3://testing/test.txt****
>
> ** **
>
> *Confidentiality Notice*: This electronic message transmission, including
> any attachment(s), may contain confidential, proprietary, or privileged
> information from Chemical Abstracts Service (“CAS”), a division of the
> American Chemical Society (“ACS”). If you have received this transmission
> in error, be advised that any disclosure, copying, distribution, or use of
> the contents of this information is strictly prohibited. Please destroy all
> copies of the message and contact the sender immediately by either replying
> to this message or calling 614-447-3600.****
>
> *Confidentiality Notice*: This electronic message transmission, including
> any attachment(s), may contain confidential, proprietary, or privileged
> information from Chemical Abstracts Service (“CAS”), a division of the
> American Chemical Society (“ACS”). If you have received this transmission
> in error, be advised that any disclosure, copying, distribution, or use of
> the contents of this information is strictly prohibited. Please destroy all
> copies of the message and contact the sender immediately by either replying
> to this message or calling 614-447-3600.
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130710/f30c46d9/attachment.html>


More information about the riak-users mailing list