mr_queue gone wild

Bryan Fink bryan at
Tue Jul 5 15:37:17 EDT 2011

On Thu, Jun 30, 2011 at 4:52 PM, Sylvain Niles <sylvain.niles at> wrote:
> Is there a way to list the m/r jobs in the queue in case there's
> something else going on? Is there a reason they never get removed?

Hi, Sylvain.  As Aphyr noted, the mr_queue is a bitcask.  Because
bitcask is append-only storage, its size alone does not give a good
indication of the active dataset.  More specifically, unless you have
tweaked your bitcask parameters, you can expect the mr_queue directory
to grow to at least 2GB before purging unused data.  This is normal
and expected.

A few things worth noting:

 - The mr_queue is only used for Javascript MapReduce phases.  Erlang
   phases never touch it.  This is because there are a limited number
   of Javascript VMs available on a node, and all vnodes compete for
   them.  The mr_queue provides a place to offload the backlog of
   pending requeusts for Javascript interpreters, rather than keeping
   them in memory.

 - To check the depth of the mr_queue, connect to any Riak node's
   console (bin/riak attach), and call
   riak_kv_map_master:queue_depth/0.  It should show you the active
   depth of the mr_queue on each node in the cluster:

   $ bin/riak attach
   (dev1 at> riak_kv_map_master:queue_depth().
   [{'dev1 at',0},
    {'dev2 at',0},
    {'dev3 at',0},
    {'dev4 at',0}]

   A result like the above means there are no pending map requests
   waiting.  A result like the following means there are 242 map
   requests waiting on the dev1 node, 128 on dev2, etc.:

   [{'dev1 at',242},
    {'dev2 at',128},
    {'dev3 at',212},
    {'dev4 at',230}]

   These numbers may be slightly confusing because a single logical
   map phase gets split into a separate map request for each vnode.
   So, you may have only one MapReduce request outstanding, but see
   numbers greater than 1 in this output.

Hope this helps,

Bryan Fink
Senior Software Developer,
Basho Technologies

