Riak CS with hadoop over s3 protocol

Charles Shah find.chuck.at at gmail.com
Fri Aug 1 17:05:47 EDT 2014


Hi Kota/John/Andrew,

Thanks for your suggestions.

So this is what i've tried with unsuccessful results.

-* jets3t.properties file*
s3service.s3-endpoint=<riak-host>
s3service.s3-endpoint-http-port=8080
s3service.disable-dns-buckets=true
s3service.s3-endpoint-virtual-path=/

httpclient.proxy-autodetect=false
httpclient.proxy-host=<riak-host>
httpclient.proxy-port=8080

I've tried the proxy and s3 service together and each separately.
I've also tried putting the file in /opt/mapr/conf ,
/opt/mapr/hadoop/hadoop-0.20.2/ and /opt/mapr/hadoop/hadoop-0.20.2/conf

After adding the settings, when I run hadoop distcp s3n://u:p@bucket/file
/mymapr/ it still connects to s3, since I get access denied message from
aws, saying they dont recognize the key and passphrase
I've also tried using pig T = LOAD 's3n://u:p@bucket/file' using
PigStorage() as (line:chararray);


- */etc/hosts file*
I know internally aws converts it to a https://<bucket>.s3.amazonaws.com/
request.
So I added that to my hosts file and had my riak cs behind a haproxy
forwarding 443 to the 8080 of the riak. When I run the hadoop distcp
command as above, I get this error:

14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Connection
refused
14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request
14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Connection
refused
14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request
14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Connection
refused
14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request
14/08/01 20:59:30 INFO metrics.MetricsUtil: getSupportedProducts {}
java.lang.RuntimeException: RPC /supportedProducts error Connection refused
        at
amazon.emr.metrics.InstanceControllerRpcClient$RpcClient.call(Unknown
Source)
        at
amazon.emr.metrics.InstanceControllerRpcClient.getSupportedProducts(Unknown
Source)
        at amazon.emr.metrics.MetricsUtil.emrClusterMapR(Unknown Source)
        at amazon.emr.metrics.MetricsSaver.<init>(Unknown Source)
        at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
        at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
        at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
        at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:166)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.fs.s3native.$Proxy0.retrieveMetadata(Unknown
Source)
        at
org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:748)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:826)
        at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:648)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:668)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:913)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:947)



*- hadoop conf*
When i add this setting to hadoop's core-site.xml (reverting the hosts file
setting)
<property>
    <name>fs.s3n.ssl.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>fs.s3n.endpoint</name>
    <value>riak-cluster</value>
  </property>

I get the same error as the one with the hosts file, so looks like the
setting makes it point to the riak cluster, however i am getting the rpc
connection issue.

*- s3cmd*

s3cmd and python boto works fine with .s3cfg and .botoconfig respectively
pointing to riak, so i know the connection works from mapr to riak, just
not with hadoop.

Any help is appreciated.

Thanks






















On Thu, Jul 31, 2014 at 5:10 PM, Kota Uenishi <kota at basho.com> wrote:

> I played on Hadoop MapReduce on Riak CS, and it actually worked with
> the latest 1.5 beta package. Hadoop relies S3 connectivity on jets3t,
> so if MapR uses vanilla jets3t it will work. I believe so because MapR
> works on EMR (which usually extracts data from S3).
>
> Technically, you can add several options about S3 endpoints to connect
> other S3-compatible cloud storages into jets3t.properties, which are
> mainly "s3service.s3-endpoint" and
> "s3service.s3-endpoint-http(s)-port". I put the properties file into
> hadoop conf directory and it worked. Maybe there is a config-loading
> in MapR, too. [1] In this case, you should properly configure your CS
> use your domain by cs_root_host in app.config. [2]
>
> If your Riak CS is not configured with your own domain, you can also
> configure MapReduce to use proxy setting like this:
>
> httpclient.proxy-host=localhost
> httpclient.proxy-port=8080
>
> I usually use this configuration when I play locally. Put them into
> jets3t.properties.
>
> Note that 1.4.x CS won't work properly if the output file is on CS
> again - it doesn't have copy API used in the final file copy after
> reduce. We have 1.5 pre-release package internally and testing. Sooner
> or later it will be released.
>
> [1] https://jets3t.s3.amazonaws.com/toolkit/configuration.html
> [2]
> http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak-CS/
>
> On Fri, Aug 1, 2014 at 4:08 AM, John Daily <jdaily at basho.com> wrote:
> > This blog post on configuring S3 clients to work with CS may be useful:
> > http://basho.com/riak-cs-proxy-vs-direct-configuration/
> >
> > Sent from my iPhone
> >
> > On Jul 31, 2014, at 2:53 PM, Andrew Stone <astone at basho.com> wrote:
> >
> > Hi Charles,
> >
> > AFAIK we haven't ever tested Riak Cs with the MapR connector. However, if
> > MapR works with S3, you should just have to change the IP to point to a
> load
> > balancer in front of your local Riak CS cluster. I'm unaware of how to
> > change that setting in MapR though. It seems like a question for them and
> > not Basho.
> >
> > -Andrew
> >
> >
> > On Wed, Jul 30, 2014 at 5:16 PM, Charles Shah <find.chuck.at at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I would like to use MapR with Riak CS for hadoop map reduce jobs. My
> code
> >> is currently referring to objects using s3n:// urls.
> >> I'd like to be able to have the hadoop code on MapR point to the Riak CS
> >> cluster using the s3 url.
> >> Is there a proxy or hostname setting in hadoop to be able to route the
> s3
> >> url to the riak cs cluster ?
> >>
> >> Thanks
> >>
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users at lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
>
>
> --
> Kota UENISHI / @kuenishi
> Basho Japan KK
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140801/25538b4b/attachment.html>


More information about the riak-users mailing list