combining Riak (CS) and Spark/shark by speaking over s3 protocol

gbrits gbrits at gmail.com
Wed Jul 31 05:07:24 EDT 2013


Thanks for the links Mark. Certainly looks possible to me. A Riak +
Spark/Shark setup almost looks like a match made in heaven. So i'm doing my
due diligence before getting too excited, since there's not too much work
around combining the two, suggesting I might be overlooking something.
Going to try the setup and see what comes out.


2013/7/31 Mark Hamstra [via Riak Users] <
ml-node+s197444n4028629h2 at n3.nabble.com>

> Others have certainly found benefits in combining Spark/Shark with a
> Dynamo-type KV-store.  With robust Hadoop Input/OutputFormats it's not too
> difficult (e.g. see this<http://www.slideshare.net/EvanChan2/cassandra2013-spark-talk-final>and
> this <http://tuplejump.github.io/calliope/>), and It may be possible to
> do as you suggest with the s3 API of Riak CS.  What also may be worth
> exploring is if Riak and Spark/Shark can rendezvous via Tachyon<https://github.com/amplab/tachyon/wiki>.
>  That would be more of a research project right now, but it could end up
> someplace interesting.
>
>
> On Tue, Jul 30, 2013 at 1:24 PM, Dan Kerrigan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4028629&i=0>
> > wrote:
>
>> Geert-Jan -
>>
>> We're currently working on a somewhat similar project to integrate Flume
>> to ingest data into Riak CS for later processing using Hadoop.  The
>> limitations of HDFS/S3, when using the s3:// or s3n:// URIs, seem to
>> revolve around renaming objects (copy/delete) in Riak CS.  If you can avoid
>> that, this link should work fine.
>>
>> Regarding how data is stored in Riak CS, the data block storage is
>> Bitcask with manifest storage being held in LevelDB.  Riak CS is optimized
>> for larger object sizes and I believe smaller object sizes would not be
>> nearly as efficient as working with plain Riak if only because of the
>> overhead incurred by Riak CS. The benefits of Riak generally carry over to
>> Riak CS so there shouldn't be any need to worry about losing raw power.
>>
>> Respectfully -
>> Dan Kerrigan
>>
>>
>> On Tue, Jul 30, 2013 at 2:21 PM, gbrits <[hidden email]<http://user/SendEmail.jtp?type=node&node=4028629&i=1>
>> > wrote:
>>
>>> This may be totally missing the mark but I've been reading up on ways to
>>> do
>>> fast iterative processing in Storm or Spark/shark, with the ultimate
>>> goal of
>>> results ending up in Riak for fast multi-key retrieval.
>>>
>>> I want this setup to be as lean as possible for obvious reasons so I've
>>> started to look more closely at the possible Riak CS / Spark combo.
>>>
>>> Apparently, please correct if wrong, Riak CS sits on top of Riak and is
>>> S3-api compliant. Underlying the db for the objects is levelDB (which
>>> would
>>> have been my choice anyway, bc of the low in-mem key overhead) Apparently
>>> Bitcask is also used, although it's not clear to me what for exactly.
>>>
>>> At the same time Spark (with Shark on top, which is what Hive is for
>>> Hadoop
>>> if that in any way makes things clearer) can use HDFS or S3 as it's so
>>> called 'deep store'.
>>>
>>> Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
>>> tight combo providing interative and adhoc quering through Shark + all
>>> the
>>> excellent stuff of Riak through the S3 protocol which they both speak .
>>>
>>> Is this correct?
>>> Would I loose any of the raw power of Riak when going with Riak CS?
>>> Anyone
>>> ever tried this combo?
>>>
>>> Thanks,
>>> Geert-Jan
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621.html
>>> Sent from the Riak Users mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=2>
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=3>
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> _______________________________________________
> riak-users mailing list
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=4>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621p4028629.html
>  To unsubscribe from combining Riak (CS) and Spark/shark by speaking over
> s3 protocol, click here<http://riak-users.197444.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4028621&code=Z2JyaXRzQGdtYWlsLmNvbXw0MDI4NjIxfDExNjk3MTIyNTA=>
> .
> NAML<http://riak-users.197444.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621p4028640.html
Sent from the Riak Users mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130731/e3484347/attachment.html>


More information about the riak-users mailing list