preflist_exhausted with map-reduce and riak-js

Jon Meredith jmeredith at basho.com
Mon Apr 2 15:27:45 EDT 2012


Very strange.

Are you seeing the

16:40:50.037 [notice] JS call failed: All VMs are busy.

message in your console.log files on any of the nodes in your cluster?

Jon

On Mon, Apr 2, 2012 at 11:48 AM, Matthew Tovbin <matthew at tovbin.com> wrote:

> Jon,
>
> Thanks for the reply. I tried increasing the 'map_js_vm_count' and 'reduce_js_vm_count'
> on the cluster to the values you suggested 24/18 and even larger ones 32/24
> and 128/128, but with no success - I still get the same error in the
> console log of the node.
>
> What are we missing?!
>
> -Matthew
>
>
>
> On Fri, Mar 30, 2012 at 16:19, Jon Meredith <jmeredith at basho.com> wrote:
>
>> Hi list,
>>
>> I spend some time looking at this issue today.  I suspect it is due to
>> all the Javascript VMs being busy inside Riak due to many parallel
>> MapReduce jobs.
>>
>> If you're seeing this issue please check for this message in the
>> console.log file
>>
>> 16:40:50.037 [notice] JS call failed: All VMs are busy.
>>
>> I've filed https://github.com/basho/riak_pipe/issues/44 to track
>> improvements to error reporting.
>>
>> If you are seeing the log message, the number of VMs allocated for map
>> and reduce tasks can be configured in the app.config - defaults are 8 for
>> map and 6 for reduce, increasing them will use more memory but should
>> reduce the incidence of the error message.
>>
>> {riak_kv, [
>>              ...
>>             {map_js_vm_count, 24 },
>>             {reduce_js_vm_count, 18 },
>>             ...]
>>
>> If you were affected by it and changing this does not resolve your issue,
>> I'll keep digging.
>>
>> Cheers, Jon.
>>
>> On Thu, Mar 29, 2012 at 10:29 AM, Matthew Tovbin <matthew at tovbin.com>wrote:
>>
>>> Guys,
>>>
>>> Any updates on the issue?!
>>>
>>> -Matthew
>>>
>>>
>>>
>>> On Tue, Mar 13, 2012 at 18:29, Matthew Tovbin <matthew at tovbin.com>wrote:
>>>
>>>> Here is a log from one of the servers:
>>>>
>>>> ==> /mnt/dataraid/log/crash.log <==
>>>> 2012-03-13 18:24:44 =CRASH REPORT====
>>>>   crasher:
>>>>     initial call: riak_pipe_builder:init/1
>>>>     pid: <0.20949.24>
>>>>     registered_name: []
>>>>     exception exit: {sink_died,shutdown}
>>>>       in function  gen_fsm:terminate/7
>>>>       in call from proc_lib:init_p_do_apply/3
>>>>     ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.172.0>]
>>>>     messages: [{'DOWN',#Ref<0.0.5.38339>,process,<0.20950.24>,shutdown}]
>>>>     links: [<0.175.0>]
>>>>     dictionary:
>>>> [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.5.38318>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.20947.24>,#Ref<0.0.5.38318>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.20060.24>,#Ref<0.0.5.38318>,sink,undefined}}}]},{fittings,[{{fitting,<0.20950.24>,#Ref<0.0.5.38318>,{chash,key_of},1},#Ref<0.0.5.38339>}]}]}]
>>>>     trap_exit: false
>>>>     status: running
>>>>     heap_size: 987
>>>>     stack_size: 24
>>>>     reductions: 251
>>>>   neighbours:
>>>> 2012-03-13 18:24:44 =SUPERVISOR REPORT====
>>>>      Supervisor: {local,riak_pipe_builder_sup}
>>>>      Context:    child_terminated
>>>>      Reason:     {sink_died,shutdown}
>>>>      Offender:
>>>> [{pid,<0.20949.24>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]
>>>>
>>>>
>>>> ==> /mnt/dataraid/log/console.log <==
>>>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process
>>>> <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
>>>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor
>>>> riak_pipe_builder_sup had child undefined started with
>>>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason
>>>> {sink_died,shutdown} in context child_terminated
>>>> 2012-03-13 18:24:56.420 [info]
>>>> <0.71.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc
>>>> <0.32603.24>
>>>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}]
>>>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}]
>>>>
>>>>
>>>> ==> /mnt/dataraid/log/erlang.log.1 <==
>>>> 18:24:44.634 [error] gen_fsm <0.20949.24> in state
>>>> wait_pipeline_shutdown terminated with reason: {sink_died,shutdown}
>>>> (riak at 1)1> 18:24:44.641 [error] Pipe worker startup failed:fitting was
>>>> gone before startup
>>>> (riak at 1)1>
>>>> 18:24:44.645 [error] CRASH REPORT Process <0.20949.24> with 0
>>>> neighbours crashed with reason: {sink_died,shutdown}
>>>> (riak at 1)1> 18:24:44.652 [error] Supervisor riak_pipe_builder_sup had
>>>> child undefined started with {riak_pipe_builder,start_link,undefined} at
>>>> <0.20949.24> exit with reason {sink_died,shutdown} in context
>>>> child_terminated
>>>> (riak at 1)1> 18:24:56.420 [info] monitor long_gc <0.32603.24>
>>>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}]
>>>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}]
>>>> (riak at 1)1>
>>>>
>>>>
>>>>
>>>> ==> /mnt/dataraid/log/error.log <==
>>>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process
>>>> <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
>>>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor
>>>> riak_pipe_builder_sup had child undefined started with
>>>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason
>>>> {sink_died,shutdown} in context child_terminated
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -Matthew
>>>>
>>>>
>>>>
>>>> On Tue, Mar 13, 2012 at 18:17, Matthew Tovbin <matthew at tovbin.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I got the same problem today on 1.1.0.
>>>>>
>>>>> As suggested, I updated all the nodes to 1.1.1. The error remains the
>>>>> same:
>>>>>
>>>>> { stack: [Getter/Setter],
>>>>>    arguments: undefined,
>>>>>   type: undefined,
>>>>>   message: 'HTTP error 500:
>>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object ....."}}}
>>>>>
>>>>>
>>>>> -Matthew
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 13, 2012 at 06:48, Jon Meredith <jmeredith at basho.com>wrote:
>>>>>
>>>>>> Hi Arne,
>>>>>>
>>>>>> Could you try with 1.1.1 please.  We've fixed a number of bugs with
>>>>>> how MapReduce handles errors that could explain some of what you are seeing.
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 13, 2012 at 12:55 AM, Arne Claassen <
>>>>>> arne+riak at claassen.net> wrote:
>>>>>>
>>>>>>> I'm running 1.1.0 built from source on Centos 6. I set up two nodes
>>>>>>> on localhost following the instructions on the basho site for testing and
>>>>>>> everything was going fine until i needed to fetch all items in a bucket.
>>>>>>>
>>>>>>> For this i was using node.js with riak-js and it's db.getAll
>>>>>>> call,which from the debug logs looks like it uses /mapred
>>>>>>>
>>>>>>> The bucket only has about 1200 items in it, each a json payload of
>>>>>>> about 3KB. Initially, it'd keep getting inconsistent results, sometimes the
>>>>>>> data would come back, sometimes data would be undefined. So i kept running
>>>>>>> it, trying to see if there was a pattern, adding more debugging code, etc.
>>>>>>> But still it would be about a request every couple of seconds. Then it just
>>>>>>> stopped working altogether and now i always get:
>>>>>>>
>>>>>>> { [Error: HTTP error 500:
>>>>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object,<<\"tweets_15368536\">>,<<\"178528155714330620\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\"Links\">>]],[],[],[],[],[],[],[],[[<<\"content-type\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\"X-Riak-VTag\">>,49,76,115,54,70,75,74,71,119,65,111,70,88,73,117,84,98,88,79,69,52,116]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1331,487523,218968}]],[],[[<<\"X-Riak-Meta\">>]]}}},<<\"{\"coordinates\":null,\"create...\">>}],...},...}","type":"forward_preflist","stack":"[]"}]
>>>>>>> statusCode: 500 }
>>>>>>>
>>>>>>> in the console. I haven't tried restarting riak, just because i'd
>>>>>>> rather find out what's happening than it going away and coming back later
>>>>>>> when i don't expect it.
>>>>>>>
>>>>>>> Any suggestions on how to troubleshoot this? I can still access
>>>>>>> items individually and i can even list the keys via ?keys=true, it's just
>>>>>>> he map reduce that seems to be busted.
>>>>>>>
>>>>>>> cheers,
>>>>>>> arne
>>>>>>> _______________________________________________
>>>>>>> riak-users mailing list
>>>>>>> riak-users at lists.basho.com
>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jon Meredith
>>>>>> Platform Engineering Manager
>>>>>> Basho Technologies, Inc.
>>>>>> jmeredith at basho.com
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Jon Meredith
>> Platform Engineering Manager
>> Basho Technologies, Inc.
>> jmeredith at basho.com
>>
>>
>


-- 
Jon Meredith
Platform Engineering Manager
Basho Technologies, Inc.
jmeredith at basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120402/5948e70e/attachment.html>


More information about the riak-users mailing list