combining Riak (CS) and Spark/shark by speaking over s3 protocol

gbrits gbrits at
Tue Jul 30 14:21:57 EDT 2013

This may be totally missing the mark but I've been reading up on ways to do
fast iterative processing in Storm or Spark/shark, with the ultimate goal of
results ending up in Riak for fast multi-key retrieval. 

I want this setup to be as lean as possible for obvious reasons so I've
started to look more closely at the possible Riak CS / Spark combo. 

Apparently, please correct if wrong, Riak CS sits on top of Riak and is
S3-api compliant. Underlying the db for the objects is levelDB (which would
have been my choice anyway, bc of the low in-mem key overhead) Apparently
Bitcask is also used, although it's not clear to me what for exactly.

At the same time Spark (with Shark on top, which is what Hive is for Hadoop
if that in any way makes things clearer) can use HDFS or S3 as it's so
called 'deep store'. 

Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
tight combo providing interative and adhoc quering through Shark + all the
excellent stuff of Riak through the S3 protocol which they both speak .

Is this correct? 
Would I loose any of the raw power of Riak when going with Riak CS? Anyone
ever tried this combo? 


View this message in context:
Sent from the Riak Users mailing list archive at

More information about the riak-users mailing list