Comparing Riak MapReduce and Hadoop MapReduce
toby.corkindale at strategicdata.com.au
Sun Jul 21 20:49:14 EDT 2013
I experimented with trying to use Riak for some Hadoop-style map-reduce
work, and didn't have great results. People on the mailing list have
advised that Riak isn't really intended to be used that way.
Hopefully others more knowledgeable than I can explain the reasons.
On 20/07/13 10:07, Xiaoming Gao wrote:
> Hi everyone,
> I am trying to learn about Riak MapReduce and comparing it with Hadoop
> MapReduce, and there are some details that I am interested in but not
> covered in the online documents. So hopefully we can get some help here
> about the following questions? Thanks in advance!
> 1. For a given MapReduce request (or to say, job), how does Riak decide how
> many mappers to use for the job? For example, if I have 8 nodes and my data
> are distributed across all nodes with an "N" value of 2, will I have 4
> mappers running on 4 nodes concurrently? Is it possible to have multiple
> mappers (e.g., 4 or even 6) for the same MR job running on each node (for
> better processing speed)?
> 2. If I run a MapReduce job over the results of a Riak Search query, how
> does Riak schedule the mappers based on the search results?
> 3. How does Riak handle intermediate data generated by mappers?
> (1) In Hadoop MapReduce, the output of mappers are <key, value> pairs, and
> the output from all mappers are first grouped based on keys, and then handed
> over to the reducer. Does Riak do similar grouping of intermediate data?
> (2) How are mapper outputs transmitted to the reducer? Does Riak use local
> disks on the mapper nodes or reducer nodes to store the intermediate data
> 4. According to the document
> http://docs.basho.com/riak/latest/dev/advanced/mapreduce/#How-Phases-Work ,
> each MR job only schedules one reducer, which runs on the coordinate node.
> Is there any way to configure a MR job to use multiple reducers?
More information about the riak-users