SV: Riak + Disco (MapReduce alternative)

Jens Rantil jens.rantil at
Wed Apr 17 10:19:14 EDT 2013


I've been following the Disco Project for a couple of years. The tricky part with using Disco with Riak would be to make sure each map phase is not executed multiple times over the same data*. Also, since each map phase would (preferably) run on the same host as its data (for data locality), you would also have to make sure to only iterate over data that is associated with the vnode for that physical host.

If you can easily extract host-specific keys for a specific vnode, then this is doable. However, either the Disco master or the Disco job submitter will need to have all this data when a job is submitted.

Also, I'm not sure that it will help very much that both are written in Erlang.

Some ideas,

* Obviously, you could also chain your mapreduce jobs in Disco to remove duplicate maps, but this introduces overhead.

Från: riak-users [mailto:riak-users-bounces at] För Antonio Rohman Fernandez
Skickat: den 17 april 2013 13:15
Till: riak-users at
Ämne: Riak + Disco (MapReduce alternative)

Hello everybody,

Has anyone tried to use Riak with Disco? [ ] I was looking for Hadoop alternatives ( as the RIAK-HADOOP connector project seems not going anywhere ) and I think Disco is quite interesting, moreover is written in Erlang same as Riak. Looks like it would be a good match!

As seen in the mailing list, seems that Riak's built-in MapReduce is not suitable for much of the queries I would be interested on doing... My idea would be to leverage the MapReduce work to a Hadoop ( or Disco, or another ) cluster that will do the GETs on the Riak cluster through an Index ( as suggested on this list... do multi-gets instead of MR ) and reduce the data independently. Does anybody has suggestions about this?



Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
rohman at<mailto:rohman at>

Wedding Album<>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list