Riak CS with hadoop over s3 protocol

Kota Uenishi kota at basho.com
Sun Aug 3 09:49:03 EDT 2014


According to the stacktrace you pasted, your software seems modified
to use some EMR RPC calls, which is not included in OSS hadoop [1]. I
can't imagine anything precise, but it looks like some reporting API
calls to Amazon's EMR service and your job seems to be accessing that
RPC with unauthenticated method. Sounds like it has been modified by
MapR from Hadoop.
With an open version of Hadoop I could advise more precisely,  but
unfortunately I'm not familiar MapR (or it's EMR edition?).

[1] https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java#L166

On Sat, Aug 2, 2014 at 6:05 AM, Charles Shah <find.chuck.at at gmail.com> wrote:
> Hi Kota/John/Andrew,
>
> Thanks for your suggestions.
>
> So this is what i've tried with unsuccessful results.
>
> - jets3t.properties file
> s3service.s3-endpoint=<riak-host>
> s3service.s3-endpoint-http-port=8080
> s3service.disable-dns-buckets=true
> s3service.s3-endpoint-virtual-path=/
>
> httpclient.proxy-autodetect=false
> httpclient.proxy-host=<riak-host>
> httpclient.proxy-port=8080
>
> I've tried the proxy and s3 service together and each separately.
> I've also tried putting the file in /opt/mapr/conf ,
> /opt/mapr/hadoop/hadoop-0.20.2/ and /opt/mapr/hadoop/hadoop-0.20.2/conf
>
> After adding the settings, when I run hadoop distcp s3n://u:p@bucket/file
> /mymapr/ it still connects to s3, since I get access denied message from
> aws, saying they dont recognize the key and passphrase
> I've also tried using pig T = LOAD 's3n://u:p@bucket/file' using
> PigStorage() as (line:chararray);
>
>
> - /etc/hosts file
> I know internally aws converts it to a https://<bucket>.s3.amazonaws.com/
> request.
> So I added that to my hosts file and had my riak cs behind a haproxy
> forwarding 443 to the 8080 of the riak. When I run the hadoop distcp command
> as above, I get this error:
>
> 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception
> (java.net.ConnectException) caught when processing request: Connection
> refused
> 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request
> 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception
> (java.net.ConnectException) caught when processing request: Connection
> refused
> 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request
> 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: I/O exception
> (java.net.ConnectException) caught when processing request: Connection
> refused
> 14/08/01 20:59:30 INFO httpclient.HttpMethodDirector: Retrying request
> 14/08/01 20:59:30 INFO metrics.MetricsUtil: getSupportedProducts {}
> java.lang.RuntimeException: RPC /supportedProducts error Connection refused
>         at
> amazon.emr.metrics.InstanceControllerRpcClient$RpcClient.call(Unknown
> Source)
>         at
> amazon.emr.metrics.InstanceControllerRpcClient.getSupportedProducts(Unknown
> Source)
>         at amazon.emr.metrics.MetricsUtil.emrClusterMapR(Unknown Source)
>         at amazon.emr.metrics.MetricsSaver.<init>(Unknown Source)
>         at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
>         at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
>         at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
>         at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:166)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.fs.s3native.$Proxy0.retrieveMetadata(Unknown
> Source)
>         at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:748)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:826)
>         at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:648)
>         at org.apache.hadoop.tools.DistCp.copy(DistCp.java:668)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:913)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:947)
>
>
>
> - hadoop conf
> When i add this setting to hadoop's core-site.xml (reverting the hosts file
> setting)
> <property>
>     <name>fs.s3n.ssl.enabled</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>fs.s3n.endpoint</name>
>     <value>riak-cluster</value>
>   </property>
>
> I get the same error as the one with the hosts file, so looks like the
> setting makes it point to the riak cluster, however i am getting the rpc
> connection issue.
>
> - s3cmd
>
> s3cmd and python boto works fine with .s3cfg and .botoconfig respectively
> pointing to riak, so i know the connection works from mapr to riak, just not
> with hadoop.
>
> Any help is appreciated.
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Jul 31, 2014 at 5:10 PM, Kota Uenishi <kota at basho.com> wrote:
>>
>> I played on Hadoop MapReduce on Riak CS, and it actually worked with
>> the latest 1.5 beta package. Hadoop relies S3 connectivity on jets3t,
>> so if MapR uses vanilla jets3t it will work. I believe so because MapR
>> works on EMR (which usually extracts data from S3).
>>
>> Technically, you can add several options about S3 endpoints to connect
>> other S3-compatible cloud storages into jets3t.properties, which are
>> mainly "s3service.s3-endpoint" and
>> "s3service.s3-endpoint-http(s)-port". I put the properties file into
>> hadoop conf directory and it worked. Maybe there is a config-loading
>> in MapR, too. [1] In this case, you should properly configure your CS
>> use your domain by cs_root_host in app.config. [2]
>>
>> If your Riak CS is not configured with your own domain, you can also
>> configure MapReduce to use proxy setting like this:
>>
>> httpclient.proxy-host=localhost
>> httpclient.proxy-port=8080
>>
>> I usually use this configuration when I play locally. Put them into
>> jets3t.properties.
>>
>> Note that 1.4.x CS won't work properly if the output file is on CS
>> again - it doesn't have copy API used in the final file copy after
>> reduce. We have 1.5 pre-release package internally and testing. Sooner
>> or later it will be released.
>>
>> [1] https://jets3t.s3.amazonaws.com/toolkit/configuration.html
>> [2]
>> http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak-CS/
>>
>> On Fri, Aug 1, 2014 at 4:08 AM, John Daily <jdaily at basho.com> wrote:
>> > This blog post on configuring S3 clients to work with CS may be useful:
>> > http://basho.com/riak-cs-proxy-vs-direct-configuration/
>> >
>> > Sent from my iPhone
>> >
>> > On Jul 31, 2014, at 2:53 PM, Andrew Stone <astone at basho.com> wrote:
>> >
>> > Hi Charles,
>> >
>> > AFAIK we haven't ever tested Riak Cs with the MapR connector. However,
>> > if
>> > MapR works with S3, you should just have to change the IP to point to a
>> > load
>> > balancer in front of your local Riak CS cluster. I'm unaware of how to
>> > change that setting in MapR though. It seems like a question for them
>> > and
>> > not Basho.
>> >
>> > -Andrew
>> >
>> >
>> > On Wed, Jul 30, 2014 at 5:16 PM, Charles Shah <find.chuck.at at gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I would like to use MapR with Riak CS for hadoop map reduce jobs. My
>> >> code
>> >> is currently referring to objects using s3n:// urls.
>> >> I'd like to be able to have the hadoop code on MapR point to the Riak
>> >> CS
>> >> cluster using the s3 url.
>> >> Is there a proxy or hostname setting in hadoop to be able to route the
>> >> s3
>> >> url to the riak cs cluster ?
>> >>
>> >> Thanks
>> >>
>> >>
>> >> _______________________________________________
>> >> riak-users mailing list
>> >> riak-users at lists.basho.com
>> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >>
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users at lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users at lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>>
>>
>>
>> --
>> Kota UENISHI / @kuenishi
>> Basho Japan KK
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>



-- 
Kota UENISHI / @kuenishi
Basho Japan KK




More information about the riak-users mailing list