combining Riak (CS) and Spark/shark by speaking over s3 protocol

gbrits gbrits at
Wed Jul 31 05:07:24 EDT 2013

Thanks for the links Mark. Certainly looks possible to me. A Riak +
Spark/Shark setup almost looks like a match made in heaven. So i'm doing my
due diligence before getting too excited, since there's not too much work
around combining the two, suggesting I might be overlooking something.
Going to try the setup and see what comes out.

2013/7/31 Mark Hamstra [via Riak Users] <
ml-node+s197444n4028629h2 at>

> Others have certainly found benefits in combining Spark/Shark with a
> Dynamo-type KV-store.  With robust Hadoop Input/OutputFormats it's not too
> difficult (e.g. see this<>and
> this <>), and It may be possible to
> do as you suggest with the s3 API of Riak CS.  What also may be worth
> exploring is if Riak and Spark/Shark can rendezvous via Tachyon<>.
>  That would be more of a research project right now, but it could end up
> someplace interesting.
> On Tue, Jul 30, 2013 at 1:24 PM, Dan Kerrigan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4028629&i=0>
> > wrote:
>> Geert-Jan -
>> We're currently working on a somewhat similar project to integrate Flume
>> to ingest data into Riak CS for later processing using Hadoop.  The
>> limitations of HDFS/S3, when using the s3:// or s3n:// URIs, seem to
>> revolve around renaming objects (copy/delete) in Riak CS.  If you can avoid
>> that, this link should work fine.
>> Regarding how data is stored in Riak CS, the data block storage is
>> Bitcask with manifest storage being held in LevelDB.  Riak CS is optimized
>> for larger object sizes and I believe smaller object sizes would not be
>> nearly as efficient as working with plain Riak if only because of the
>> overhead incurred by Riak CS. The benefits of Riak generally carry over to
>> Riak CS so there shouldn't be any need to worry about losing raw power.
>> Respectfully -
>> Dan Kerrigan
>> On Tue, Jul 30, 2013 at 2:21 PM, gbrits <[hidden email]<http://user/SendEmail.jtp?type=node&node=4028629&i=1>
>> > wrote:
>>> This may be totally missing the mark but I've been reading up on ways to
>>> do
>>> fast iterative processing in Storm or Spark/shark, with the ultimate
>>> goal of
>>> results ending up in Riak for fast multi-key retrieval.
>>> I want this setup to be as lean as possible for obvious reasons so I've
>>> started to look more closely at the possible Riak CS / Spark combo.
>>> Apparently, please correct if wrong, Riak CS sits on top of Riak and is
>>> S3-api compliant. Underlying the db for the objects is levelDB (which
>>> would
>>> have been my choice anyway, bc of the low in-mem key overhead) Apparently
>>> Bitcask is also used, although it's not clear to me what for exactly.
>>> At the same time Spark (with Shark on top, which is what Hive is for
>>> Hadoop
>>> if that in any way makes things clearer) can use HDFS or S3 as it's so
>>> called 'deep store'.
>>> Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
>>> tight combo providing interative and adhoc quering through Shark + all
>>> the
>>> excellent stuff of Riak through the S3 protocol which they both speak .
>>> Is this correct?
>>> Would I loose any of the raw power of Riak when going with Riak CS?
>>> Anyone
>>> ever tried this combo?
>>> Thanks,
>>> Geert-Jan
>>> --
>>> View this message in context:
>>> Sent from the Riak Users mailing list archive at
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=2>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=3>
> _______________________________________________
> riak-users mailing list
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4028629&i=4>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>  To unsubscribe from combining Riak (CS) and Spark/shark by speaking over
> s3 protocol, click here<>
> .
> NAML<>

View this message in context:
Sent from the Riak Users mailing list archive at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list