Map Reduce and long queries -

David Montgomery davidmontgomery at gmail.com
Sun Oct 14 07:57:23 EDT 2012


Hi,

Below is my code for running a map reduce in python.  I have a six
node cluster, 2 cores each with 4 gigs for ram.  I am no load and
about 3 Mill keys and using leveldb with riak 1.2.  Doing  the below
is taking a terribly long time.  Never finished and I dont even know
how I can check if it is even running other than the python script has
not timed out.  I look at the number of executed mappers in stats and
it is flat lined when looking at Graphite.  On test queries the below
works.

So..how do I debug what is going on?


def main():
    client  = riak.RiakClient(host=riak_host,port=8087,transport_class=riak.transports.pbc.RiakPbcTransport)
    query = client.add(bucket)
    filters = key_filter.tokenize(":", filter_map['date']) +
(key_filter.starts_with('201210'))
              #&  key_filter.tokenize(":", filter_map['country']).eq("US") \
              #&  key_filter.tokenize(":", filter_map['campaign_id']).eq("t1") \
    query.add_key_filters(filters)

    query.map('''
    function(value, keyData, arg) {
        var data = Riak.mapValuesJson(value)[0];

        if(data['adx']=='gdn'){
            var alt_key = data['hw'];
            var obj = {};
            obj[alt_key] = 1;
            return [ obj ];
        }else{
           return [];
        }


    }''')


    query.reduce('''
    function(values, arg){
        return [ values.reduce( function(acc, item) {
            for (var state in item) {
                if (acc[state])
                    acc[state] += item[state];
                else
                    acc[state] = item[state];
            }
            return acc;
        })];
    }
    ''')

    for result in query.run(timeout=300000):
        print result




More information about the riak-users mailing list