MapReduce performance problem

Kevin Burton rkevinburton at charter.net
Tue Feb 26 13:52:45 EST 2013


I have a simple CorrugatedIron client that makes the following request:

 

                IRiakClient riakClient = cluster.CreateClient();

                RiakBinIndexRangeInput bucketKeyInput = new
RiakBinIndexRangeInput(productBucketName, "$key", "00000000", "99999999");

                RiakMapReduceQuery query = new RiakMapReduceQuery()

                   .Inputs(bucketKeyInput)

                   .MapJs(m => m.Name("Riak.mapValuesJson").Keep(true));

                RiakResult<RiakMapReduceResult> result =
riakClient.MapReduce(query);

 

So as you can see this is a very basic range m/r query. But the result comes
back as:

 

Riak returned an error. Code '0'. Message: timeout

CommunicationError

 

Another type of m/r query I have

 

                IRiakClient riakClient = cluster.CreateClient();

                var query = new RiakMapReduceQuery()

                    .Inputs(productBucketName)

                    .MapJs(m => m.Source(@"function(v,d,a) {" +

                        "var p = JSON.parse(v.values[0].data);" +

                        "var r = [];" +

                        "d = escape(p.Department);" +

                        "if(d != '') {" +

                        "var o = {};" +

                        "o[d] = 1;" +

                        "r.push(o);" +

                        "}" +

                        "return r;" +

                        "}"))

                    .ReduceJs(m => m.Source(@"function(v,d,a) {" +

                        "var r = {};" +

                        "for(var i in v) {" +

                        "  for(var w in v[i]) {" +

                        "    if(w in r) r[w] += v[i][w];" +

                        "    else r[w] = v[i][w];" +

                        "  }" +

                        "}" +

                        "return [r];" +

                        "}")

                        .Keep(true));

 

This returns but it takes far too long. I have about 60,000 items in my
bucket and this takes about 50-60 seconds to execute. The results seem
valid. For these types of m/r jobs what can I do on the server (or client)
to helo diagnose the problem.  I have basic tools like iostat and top to
give me data but some pointers on using the output of these tools might
help.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130226/7600d906/attachment.html>


More information about the riak-users mailing list