Map reduce weirdness on Riak 1.3

Christian Dahlqvist christian at basho.com
Sat Apr 6 15:19:43 EDT 2013


Hi Kartik,

What you are seeing is a result of you not accounting for re-reduce in you reduce phase function. 

In Riak reduce phases generally run recursively and the input for each run may contain both values from preceding map phase as well as output from previous iterations of the reduce phase. In order for the reduce phase to behave correctly you will need to distinguish between the different types of input records in your reduce function. 

Best regards,

Christian




On 6 Apr 2013, at 19:09, Kartik Thakore <kthakore at aimed.cc> wrote:

> Hello,
> 
> I recently setup a test cluster to try to do a tech demo web application on.
> 
> I have been having some weirdness with the map reduce functionality.
> 
> My database is here:
> 
> http://aimed.cc:8098/riak/rekon/go#/buckets/test_rand_docs
> 
> The cluster has 5 nodes
> ulimit 4096
> 
> This is Riak 1.3.0 release on Debian with 663 of free memory.
> 
> I am running this map reduce:
> 
> curl -X POST -H "content-type: application/json" \
>     http://aimed.cc:8098/mapred --data @-<<\EOF
> {"inputs": "test_rand_docs",
> "query":[{"map":{"language":"javascript","source":"
>     function (v) {
>         var r = {};
>         var data = JSON.parse(v.values[0].data);
>         r.data = data;
>         r.key = v.key;
>         return [ r ];
>     }
> "}},{"reduce":{"language":"javascript","source":"
>     function (v) {
>         var r = {};
>         for( var i in v )
>         {
>             var doc = v[i];
>             if( doc['data'] !== undefined) {
>                 var age = doc['data']['age_int'];
>                 if ( age !== undefined && age > 10 && age <25 ){
>                     r[doc['key']] = doc['data'];
>                 }
>             }
>         }
>         return  [ r ];
> 
>     }
> "}}]
> 
> 
> my result is randomly:
> 
> [{"9DYMGV0B6Jdn5DivoTExiqyDYUC":{"age_int":24},"JQYUs2onC822EOzMaToz71j77e":{"age_int":18},"AcrUwotAdYaV5zitaMylnUgYsWY":{"age_int":24}}]
> 
> 
> or
> 
> [{"LYJpg97ZA5qjZTTv2cfavmRgxLb":{"age_int":11}}]
> 
> but it is clear with this:
> 
> http://aimed.cc:8098/solr/test_rand_docs/select?q=age_int:[10%20TO%2025]
> 
> 
> that there are 134 records ....
> 
> so what is going on?
> 
> 
> Is it low memory? Or that it is on a XEN machine (Linode)? Is there a scaleable memory server vendor (AWS or w/e) I should consider?
> 
> Thanks,
> Kartik Thakore
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130406/5ac66fab/attachment.html>


More information about the riak-users mailing list