Mapreduce crosstalk

Aphyr aphyr at aphyr.com
Tue May 17 15:35:11 EDT 2011


I was writing a new mapreduce query to look at users over time, and ran 
it over a single user in production. After that, other mapreduce jobs 
over users started returning results from my new map phase, some of the 
time. After five minutes of this, I had to restart every node in the 
cluster to get it to stop.

Every node has {map_cache_size, 0} in riak_kv.

The map phase that screwed things up was:

function(v) {
   o = JSON.parse(v.values[0].data);

   // Age of account in days
   age = Math.round(
     (Date.now() - Date.iso8601(o.created_at)) /
     (1000 * 60 * 60 * 24)
   );

   return [['t_user_scores', v.key, age]];
}

It looks like one node started running that phase instead of the 
requested phase for subsequent jobs. It *should* have run this one, but 
didn't.

function(v) {
	o = JSON.parse(v.values[0].data);
	return [{
		key: v.key,
		name: o.name,
		thumbnail: o.thumbnail
	}];
}

Now I'm scared to run MR jobs. Could it be an issue with returning 
keydata? Anybody else seen this before?

--Kyle




More information about the riak-users mailing list