Issues with MapReduce in Riak 1.1.0

Jon Meredith jmeredith at basho.com
Fri Feb 24 19:56:46 EST 2012


Issues have been reported by users and customers with the MapReduce
subsystem in Riak 1.1.0.

  1) MapReduce fails in clusters that contain both 1.1.0 and older nodes
  2) Javascript MapReduce jobs leak Javascript VMs under some failure
conditions.
  3) Keylisting continues after bucket MapReduce is cancelled/timeout

We are actively investigating these issues to be resolved in an
upcoming point release. We will update the individual issue
trackers as we make progress.

= Issue: MapReduce fails in clusters that contain both 1.1.0 and older
nodes =

Description:

  The 1.1.0 release has enhancements to the way requests are
  routed between nodes. It includes a legacy mode for use while
  clusters are being upgraded, using the original routing used in
  the 1.0 series and before. Code required to support legacy
  routing for MapReduce requests was omitted.

What version of Riak is affected?

  Open Source and Enterprise versions of Riak 1.1.0 are affected.

Are all users affected?

  This issue only arises during a rolling upgrade. Users will be
  unable to issue MapReduce jobs while there remains 1.0.x or
  0.14.x nodes in the cluster.

  Users that are not using MapReduce, or who have clusters
  containing only 1.1.0 nodes are unaffected.

Can I safely upgrade to 1.1.0 if I am not using MapReduce?

  Yes

Issue Tracker:

  https://github.com/basho/riak_core/issues/144

= Issue: Javascript MapReduce jobs leak Javascript VMs under some failure
conditions =

Description:

  Javascript MapReduce uses a pool of Javascript virtual machines
  inside Riak. There are some edge cases where MapReduce jobs are
  cancelled (due to timeouts/dropped connections) where the VMs
  are not returned to the pool.  Eventually all Javascript VMs in
  the pool can become exhausted so that no further Javascript
  MapReduce jobs can run until the node is restarted.

What version of Riak is affected?

  Open Source and Enterprise versions of Riak 1.0.x and 1.1.0 are
  affected.

Are all users affected?

  Only users using Javascript MapReduce are affected.

Can I safely upgrade to 1.1.0 if I am not using Javascript MapReduce?

  Yes.

Issue Tracker:

  https://github.com/basho/riak_kv/issues/287

= Issue: Keylisting continues after bucket MapReduce is cancelled/timeout =

Description:

  If MapReduce against a bucket is cancelled (timeout or dropped
  client connection) the listkeys feeding the objects from the
  bucket continues to run and generates a large number of error
  messages:

  "Pipe worker startup failed:fitting was gone before startup"

  Additionally, these running listkey jobs continue to tie up
  resources that would otherwise be capable of running other
  MapReduce queries. If enough listkey jobs end up in this state,
  subsequent MapReduce queries will be unable to complete and
  will timeout. In the worst case, all MapReduce queries
  submitted will timeout until the running listkeys eventually
  terminate and the cluster recovers.

What version of Riak is affected?

  Riak 1.0.x and 1.1.0 are both affected.  The listkeys
  back-pressure mechanism added in 1.1.0 has increased the
  recovery time from the issue.

Are all users affected?

  This affects users that are seeing frequent timeouts with
  bucket MapReduce.

Can I safely upgrade to 1.1.0 if I am not using MapReduce?

  Yes.

Issue Tracker

  https://github.com/basho/riak_kv/issues/293

-- 
Jon Meredith
Platform Engineering Manager
Basho Technologies, Inc.
jmeredith at basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120224/b9634624/attachment.html>


More information about the riak-users mailing list