MR Timeout

Bryan Fink bryan at basho.com
Fri Aug 3 16:48:31 EDT 2012


On Fri, Aug 3, 2012 at 10:50 AM, Yousuf Fauzan <yousuffauzan at gmail.com> wrote:
> The query fails with the following error
> <<"{\"phase\":1,\"error\":\"{{{badmatch,[]},[{riak_kv_js_manager,needs_reload,2},{riak_kv_js_manager,handle_call,3},...

Hi, Yousuf. I think there may be a race between a Javascript VM
marking itself idle and the same VM getting the message that its
manager has died. Could you please check in your Riak logs for an
error similar to:

16:32:10.693 [error] Supervisor riak_kv_sup had child riak_kv_js_map
started with riak_kv_js_manager:start_link(riak_kv_js_map, 8) at
<0.284.0> exit with reason killed in context child_terminated

The important part to look for is the first bit about "Supervisor
riak_kv_sup had child riak_kv_js_map started with
riak_kv_js_manager:start_link". If that happened, then the error your
seeing is an old VM trying to mark itself idle with a new manager that
doesn't know about it. I'll work up a patch to solve this issue.

*Why* that manager exited, if that is indeed what you find in your
logs, is another question, and may have to do with many JS VMs
crashing suddenly. Such situations are often linked to high memory
pressure, in my experience.

> Also, MR is quite slow. Is there a way to speed things up?

The biggest speedup in MR performance often comes from rewriting
Javascript phases in Erlang. It's annoying, yes, but it avoids the
costs of JSON (de-)serialization and inter-VM communication. Reduce
phases, in particular, also benefit from ensuring that your function
is actually *reducing* (not accumulating an ever-growing result on
each invocation), and from tuning reduce_phase_batch_size or even
using reduce_phase_only_1.
(http://wiki.basho.com/MapReduce-Implementation.html#Configuration-Tuning-for-Reduce-Phases)

Hope that helps,
Bryan




More information about the riak-users mailing list