combining Riak (CS) and Spark/shark by speaking over s3 protocol

Dan Kerrigan dan.kerrigan at
Tue Jul 30 16:24:57 EDT 2013

Geert-Jan -

We're currently working on a somewhat similar project to integrate Flume to
ingest data into Riak CS for later processing using Hadoop.  The
limitations of HDFS/S3, when using the s3:// or s3n:// URIs, seem to
revolve around renaming objects (copy/delete) in Riak CS.  If you can avoid
that, this link should work fine.

Regarding how data is stored in Riak CS, the data block storage is Bitcask
with manifest storage being held in LevelDB.  Riak CS is optimized for
larger object sizes and I believe smaller object sizes would not be nearly
as efficient as working with plain Riak if only because of the overhead
incurred by Riak CS. The benefits of Riak generally carry over to Riak CS
so there shouldn't be any need to worry about losing raw power.

Respectfully -
Dan Kerrigan

On Tue, Jul 30, 2013 at 2:21 PM, gbrits <gbrits at> wrote:

> This may be totally missing the mark but I've been reading up on ways to do
> fast iterative processing in Storm or Spark/shark, with the ultimate goal
> of
> results ending up in Riak for fast multi-key retrieval.
> I want this setup to be as lean as possible for obvious reasons so I've
> started to look more closely at the possible Riak CS / Spark combo.
> Apparently, please correct if wrong, Riak CS sits on top of Riak and is
> S3-api compliant. Underlying the db for the objects is levelDB (which would
> have been my choice anyway, bc of the low in-mem key overhead) Apparently
> Bitcask is also used, although it's not clear to me what for exactly.
> At the same time Spark (with Shark on top, which is what Hive is for Hadoop
> if that in any way makes things clearer) can use HDFS or S3 as it's so
> called 'deep store'.
> Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
> tight combo providing interative and adhoc quering through Shark + all the
> excellent stuff of Riak through the S3 protocol which they both speak .
> Is this correct?
> Would I loose any of the raw power of Riak when going with Riak CS? Anyone
> ever tried this combo?
> Thanks,
> Geert-Jan
> --
> View this message in context:
> Sent from the Riak Users mailing list archive at
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list