Intermittent MapReduce crashes (reserve_vm)

Brian Conway bconway at rcesoftware.com
Thu Apr 26 01:23:14 EDT 2012


I have a test cluster of 3 nodes running locally (virtualized), with
default configuration + eleveldb. The nodes have plenty of ram and
never hit swap. I've already bumped up the JS VM count (8 -> 24) after
getting preflist_exhausted errors, and I now get the follow
intermittently when posting to /mapred:

$ curl -X POST http://10.236.174.131:8098/mapred -H "Content-Type:
application/json" -d @volume.js
{"phase":3,"error":"{noproc,{gen_server,call,[riak_kv_js_map,{reserve_vm,<11534.1650.0>},infinity]}}","input":"{ok,{r_object,<<\"vol\">>,<<\"6724_2012-01-21_18\">>,[{r_content,{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<\"content-type\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\"X-Riak-VTag\">>,52,68,111,85,117,98,80,99,106,114,79,106,71,115,107,118,85,67,88,117,68,107]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1335,385687,399828}]],[],[]}}},<<\"{\"dlid\":
\"1\", \"rate\": \"0.08\", \"cnid\":
\"...\">>}],...},...}","type":"exit","stack":"[{gen_server,call,3},{riak_kv_js_manager,blocking_dispatch,4},{riak_kv_mrc_map,map_js,3},{riak_kv_mrc_map,process,3},{riak_pipe_vnode_worker,process_input,3},{riak_pipe_vnode_worker,wait_for_input,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]"}

This only seems to happen every two or three attempts, the rest
complete successfully. Doing the same with Python and protocol buffers
also gives inconsistent results. Those attempts sometimes work and
sometimes throws off errors that are either the same as above, or like
these (may be unrelated):

...
  File "/home/bconway/scratch/riakenv/lib/python2.6/site-packages/riak/transports/pbc.py",
line 535, in recv_pkt
    % len(nmsglen))
riak.RiakError: 'Socket returned short packet length 3 - expected 4'

...
  File "/home/bconway/scratch/riakenv/lib/python2.6/site-packages/riak/transports/pbc.py",
line 535, in recv_pkt
    % len(nmsglen))
riak.RiakError: 'Socket returned short packet length 1 - expected 4'

The MapReduce itself is wide but fairly simple: 10 user bucket-key
pairs, a few layers of links, and dump the final data:

$ cat volume.js
{"inputs":[["user","1672_2012-01"],["user","2672_2012-01"],["user","3672_2012-01"],["user","4672_2012-01"],["user","5672_2012-01"],["user","6672_2012-01"],["user","672_2012-01"],["user","6723_2012-01"],["user","6724_2012-01"],["user","6725_2012-01"]],
 "query":[{"link":{"tag":"day"}},
	  {"link":{"tag":"usage"}},
	  {"link":{"tag":"contact"}},
	  {"map":{
	      "language":"javascript",
	      "name":"Riak.mapValuesJson"
	  }}
	 ]
}

The logs are fairly chatty, let me know what else I should add:

** Reason for termination ==
** {{{badmatch,[]},[{riak_kv_js_manager,needs_reload,2},{riak_kv_js_manager,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]},{gen_server,call,[riak_kv_js_map,{mark_idle,<0.1756.0>},infinity]}}
2012-04-26 00:18:18 =CRASH REPORT====
  crasher:
    initial call: riak_kv_js_vm:init/1
    pid: <0.1756.0>
    registered_name: []
    exception exit:
{{{badmatch,[]},[{riak_kv_js_manager,needs_reload,2},{riak_kv_js_manager,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]},{gen_server,call,[riak_kv_js_map,{mark_idle,<0.1756.0>},infinity]}}
      in function  gen_server:terminate/6
      in call from proc_lib:init_p_do_apply/3
    ancestors: [riak_kv_js_sup,riak_kv_sup,<0.256.0>]
    messages: [{'DOWN',#Ref<0.0.0.149247>,process,<0.1753.0>,{timeout,{gen_server,call,[<0.1764.0>,{checkout_to,<0.2736.0>},1000]}}}]
    links: [<0.275.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1597
    stack_size: 24
    reductions: 627539
  neighbours:

Thanks for any help.

Brian Conway




More information about the riak-users mailing list