MapReduce scalability

Christian Dahlqvist christian at basho.com
Tue Feb 26 01:57:35 EST 2013


Hi Boris,

MapReduce is a very flexible and powerful way of querying Riak and allows processing to be performed locally where the data resides, which allows for efficient processing of larger data sets. A result of this is that every mapreduce job requires a covering set of vnodes (all vnodes that hold the data required for processing) to participate, meaning that it puts considerable more load on the system compared to straight K/V access and therefore does not scale quite as well. It is primarily designed for batch type processing over reasonably large amounts of data and scales well with increased data volumes as new nodes are added. We do however usually not recommended using it as an interface for realtime queries where low and predictable latencies are required and the concurrency level, and therefore load level on the cluster, can not be controlled.

I am not sure I understand what you mean by the performance degrading with the number of nodes, unless you are strictly measuring latency rather than throughput. As the number of nodes increase, it gets more and more likely that multiple physical nodes will be involved in the job, which will add to the amount of communication and coordination required between the nodes, thereby increasing latency. Could you please explain in more detail what you are trying to achieve?

Best regards,

Christian 


On 25 Feb 2013, at 16:41, Boris Okner <boris.okner at gmail.com> wrote:

> Hello,
> 
> I'm experimenting with 2 Riak 1.3.0 nodes (both are "bare metal"), and it looks like mapreduce performs better when one of the nodes is down. The mapreduce requests are running on 20-key blocks. So am I doing something wrong, or is it an expected behaviour, i.e. mapreduce degrades with the the number of nodes increased? If the former, could 
> you give me some pointers on how to set up it to get advantage of multiple nodes?
> 
> Thanks in advance for your help,
> Boris
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130226/e58eb52f/attachment.html>


More information about the riak-users mailing list