combining Riak (CS) and Spark/shark by speaking over s3 protocol
gbrits at gmail.com
Wed Jul 31 05:07:24 EDT 2013
Thanks for the links Mark. Certainly looks possible to me. A Riak +
Spark/Shark setup almost looks like a match made in heaven. So i'm doing my
due diligence before getting too excited, since there's not too much work
around combining the two, suggesting I might be overlooking something.
Going to try the setup and see what comes out.
2013/7/31 Mark Hamstra [via Riak Users] <
ml-node+s197444n4028629h2 at n3.nabble.com>
> Others have certainly found benefits in combining Spark/Shark with a
> Dynamo-type KV-store. With robust Hadoop Input/OutputFormats it's not too
> difficult (e.g. see this<http://www.slideshare.net/EvanChan2/cassandra2013-spark-talk-final>and
> this <http://tuplejump.github.io/calliope/>), and It may be possible to
> do as you suggest with the s3 API of Riak CS. What also may be worth
> exploring is if Riak and Spark/Shark can rendezvous via Tachyon<https://github.com/amplab/tachyon/wiki>.
> That would be more of a research project right now, but it could end up
> someplace interesting.
> On Tue, Jul 30, 2013 at 1:24 PM, Dan Kerrigan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4028629&i=0>
> > wrote:
>> Geert-Jan -
>> We're currently working on a somewhat similar project to integrate Flume
>> to ingest data into Riak CS for later processing using Hadoop. The
>> limitations of HDFS/S3, when using the s3:// or s3n:// URIs, seem to
>> revolve around renaming objects (copy/delete) in Riak CS. If you can avoid
>> that, this link should work fine.
>> Regarding how data is stored in Riak CS, the data block storage is
>> Bitcask with manifest storage being held in LevelDB. Riak CS is optimized
>> for larger object sizes and I believe smaller object sizes would not be
>> nearly as efficient as working with plain Riak if only because of the
>> overhead incurred by Riak CS. The benefits of Riak generally carry over to
>> Riak CS so there shouldn't be any need to worry about losing raw power.
>> Respectfully -
>> Dan Kerrigan
>> On Tue, Jul 30, 2013 at 2:21 PM, gbrits <[hidden email]<http://user/SendEmail.jtp?type=node&node=4028629&i=1>
>> > wrote:
>>> This may be totally missing the mark but I've been reading up on ways to
>>> fast iterative processing in Storm or Spark/shark, with the ultimate
>>> goal of
>>> results ending up in Riak for fast multi-key retrieval.
>>> I want this setup to be as lean as possible for obvious reasons so I've
>>> started to look more closely at the possible Riak CS / Spark combo.
>>> Apparently, please correct if wrong, Riak CS sits on top of Riak and is
>>> S3-api compliant. Underlying the db for the objects is levelDB (which
>>> have been my choice anyway, bc of the low in-mem key overhead) Apparently
>>> Bitcask is also used, although it's not clear to me what for exactly.
>>> At the same time Spark (with Shark on top, which is what Hive is for
>>> if that in any way makes things clearer) can use HDFS or S3 as it's so
>>> called 'deep store'.
>>> Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
>>> tight combo providing interative and adhoc quering through Shark + all
>>> excellent stuff of Riak through the S3 protocol which they both speak .
>>> Is this correct?
>>> Would I loose any of the raw power of Riak when going with Riak CS?
>>> ever tried this combo?
>>> View this message in context:
>>> Sent from the Riak Users mailing list archive at Nabble.com.
>>> riak-users mailing list
>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=2>
>> riak-users mailing list
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=3>
> riak-users mailing list
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=4>
> If you reply to this email, your message will be added to the discussion
> To unsubscribe from combining Riak (CS) and Spark/shark by speaking over
> s3 protocol, click here<http://riak-users.197444.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4028621&code=Z2JyaXRzQGdtYWlsLmNvbXw0MDI4NjIxfDExNjk3MTIyNTA=>
View this message in context: http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621p4028640.html
Sent from the Riak Users mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users