Riak CS with hadoop over s3 protocol

Kota Uenishi kota at basho.com
Thu Jul 31 20:10:58 EDT 2014


I played on Hadoop MapReduce on Riak CS, and it actually worked with
the latest 1.5 beta package. Hadoop relies S3 connectivity on jets3t,
so if MapR uses vanilla jets3t it will work. I believe so because MapR
works on EMR (which usually extracts data from S3).

Technically, you can add several options about S3 endpoints to connect
other S3-compatible cloud storages into jets3t.properties, which are
mainly "s3service.s3-endpoint" and
"s3service.s3-endpoint-http(s)-port". I put the properties file into
hadoop conf directory and it worked. Maybe there is a config-loading
in MapR, too. [1] In this case, you should properly configure your CS
use your domain by cs_root_host in app.config. [2]

If your Riak CS is not configured with your own domain, you can also
configure MapReduce to use proxy setting like this:

httpclient.proxy-host=localhost
httpclient.proxy-port=8080

I usually use this configuration when I play locally. Put them into
jets3t.properties.

Note that 1.4.x CS won't work properly if the output file is on CS
again - it doesn't have copy API used in the final file copy after
reduce. We have 1.5 pre-release package internally and testing. Sooner
or later it will be released.

[1] https://jets3t.s3.amazonaws.com/toolkit/configuration.html
[2] http://docs.basho.com/riakcs/latest/cookbooks/configuration/Configuring-Riak-CS/

On Fri, Aug 1, 2014 at 4:08 AM, John Daily <jdaily at basho.com> wrote:
> This blog post on configuring S3 clients to work with CS may be useful:
> http://basho.com/riak-cs-proxy-vs-direct-configuration/
>
> Sent from my iPhone
>
> On Jul 31, 2014, at 2:53 PM, Andrew Stone <astone at basho.com> wrote:
>
> Hi Charles,
>
> AFAIK we haven't ever tested Riak Cs with the MapR connector. However, if
> MapR works with S3, you should just have to change the IP to point to a load
> balancer in front of your local Riak CS cluster. I'm unaware of how to
> change that setting in MapR though. It seems like a question for them and
> not Basho.
>
> -Andrew
>
>
> On Wed, Jul 30, 2014 at 5:16 PM, Charles Shah <find.chuck.at at gmail.com>
> wrote:
>>
>> Hi,
>>
>> I would like to use MapR with Riak CS for hadoop map reduce jobs. My code
>> is currently referring to objects using s3n:// urls.
>> I'd like to be able to have the hadoop code on MapR point to the Riak CS
>> cluster using the s3 url.
>> Is there a proxy or hostname setting in hadoop to be able to route the s3
>> url to the riak cs cluster ?
>>
>> Thanks
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Kota UENISHI / @kuenishi
Basho Japan KK




More information about the riak-users mailing list