preflist_exhausted with map-reduce and riak-js

Matthew Tovbin matthew at tovbin.com
Mon Apr 2 13:48:54 EDT 2012


Jon,

Thanks for the reply. I tried increasing the 'map_js_vm_count' and
'reduce_js_vm_count'
on the cluster to the values you suggested 24/18 and even larger ones 32/24
and 128/128, but with no success - I still get the same error in the
console log of the node.

What are we missing?!

-Matthew



On Fri, Mar 30, 2012 at 16:19, Jon Meredith <jmeredith at basho.com> wrote:

> Hi list,
>
> I spend some time looking at this issue today.  I suspect it is due to all
> the Javascript VMs being busy inside Riak due to many parallel MapReduce
> jobs.
>
> If you're seeing this issue please check for this message in the
> console.log file
>
> 16:40:50.037 [notice] JS call failed: All VMs are busy.
>
> I've filed https://github.com/basho/riak_pipe/issues/44 to track
> improvements to error reporting.
>
> If you are seeing the log message, the number of VMs allocated for map and
> reduce tasks can be configured in the app.config - defaults are 8 for map
> and 6 for reduce, increasing them will use more memory but should reduce
> the incidence of the error message.
>
> {riak_kv, [
>              ...
>             {map_js_vm_count, 24 },
>             {reduce_js_vm_count, 18 },
>             ...]
>
> If you were affected by it and changing this does not resolve your issue,
> I'll keep digging.
>
> Cheers, Jon.
>
> On Thu, Mar 29, 2012 at 10:29 AM, Matthew Tovbin <matthew at tovbin.com>wrote:
>
>> Guys,
>>
>> Any updates on the issue?!
>>
>> -Matthew
>>
>>
>>
>> On Tue, Mar 13, 2012 at 18:29, Matthew Tovbin <matthew at tovbin.com> wrote:
>>
>>> Here is a log from one of the servers:
>>>
>>> ==> /mnt/dataraid/log/crash.log <==
>>> 2012-03-13 18:24:44 =CRASH REPORT====
>>>   crasher:
>>>     initial call: riak_pipe_builder:init/1
>>>     pid: <0.20949.24>
>>>     registered_name: []
>>>     exception exit: {sink_died,shutdown}
>>>       in function  gen_fsm:terminate/7
>>>       in call from proc_lib:init_p_do_apply/3
>>>     ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.172.0>]
>>>     messages: [{'DOWN',#Ref<0.0.5.38339>,process,<0.20950.24>,shutdown}]
>>>     links: [<0.175.0>]
>>>     dictionary:
>>> [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.5.38318>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.20947.24>,#Ref<0.0.5.38318>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.20060.24>,#Ref<0.0.5.38318>,sink,undefined}}}]},{fittings,[{{fitting,<0.20950.24>,#Ref<0.0.5.38318>,{chash,key_of},1},#Ref<0.0.5.38339>}]}]}]
>>>     trap_exit: false
>>>     status: running
>>>     heap_size: 987
>>>     stack_size: 24
>>>     reductions: 251
>>>   neighbours:
>>> 2012-03-13 18:24:44 =SUPERVISOR REPORT====
>>>      Supervisor: {local,riak_pipe_builder_sup}
>>>      Context:    child_terminated
>>>      Reason:     {sink_died,shutdown}
>>>      Offender:
>>> [{pid,<0.20949.24>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]
>>>
>>>
>>> ==> /mnt/dataraid/log/console.log <==
>>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process
>>> <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
>>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor
>>> riak_pipe_builder_sup had child undefined started with
>>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason
>>> {sink_died,shutdown} in context child_terminated
>>> 2012-03-13 18:24:56.420 [info]
>>> <0.71.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc
>>> <0.32603.24>
>>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}]
>>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}]
>>>
>>>
>>> ==> /mnt/dataraid/log/erlang.log.1 <==
>>> 18:24:44.634 [error] gen_fsm <0.20949.24> in state
>>> wait_pipeline_shutdown terminated with reason: {sink_died,shutdown}
>>> (riak at 1)1> 18:24:44.641 [error] Pipe worker startup failed:fitting was
>>> gone before startup
>>> (riak at 1)1>
>>> 18:24:44.645 [error] CRASH REPORT Process <0.20949.24> with 0 neighbours
>>> crashed with reason: {sink_died,shutdown}
>>> (riak at 1)1> 18:24:44.652 [error] Supervisor riak_pipe_builder_sup had
>>> child undefined started with {riak_pipe_builder,start_link,undefined} at
>>> <0.20949.24> exit with reason {sink_died,shutdown} in context
>>> child_terminated
>>> (riak at 1)1> 18:24:56.420 [info] monitor long_gc <0.32603.24>
>>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}]
>>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}]
>>> (riak at 1)1>
>>>
>>>
>>>
>>> ==> /mnt/dataraid/log/error.log <==
>>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process
>>> <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown}
>>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor
>>> riak_pipe_builder_sup had child undefined started with
>>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason
>>> {sink_died,shutdown} in context child_terminated
>>>
>>>
>>>
>>>
>>>
>>>
>>> -Matthew
>>>
>>>
>>>
>>> On Tue, Mar 13, 2012 at 18:17, Matthew Tovbin <matthew at tovbin.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I got the same problem today on 1.1.0.
>>>>
>>>> As suggested, I updated all the nodes to 1.1.1. The error remains the
>>>> same:
>>>>
>>>> { stack: [Getter/Setter],
>>>>    arguments: undefined,
>>>>   type: undefined,
>>>>   message: 'HTTP error 500:
>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object ....."}}}
>>>>
>>>>
>>>> -Matthew
>>>>
>>>>
>>>>
>>>> On Tue, Mar 13, 2012 at 06:48, Jon Meredith <jmeredith at basho.com>wrote:
>>>>
>>>>> Hi Arne,
>>>>>
>>>>> Could you try with 1.1.1 please.  We've fixed a number of bugs with
>>>>> how MapReduce handles errors that could explain some of what you are seeing.
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>> On Tue, Mar 13, 2012 at 12:55 AM, Arne Claassen <
>>>>> arne+riak at claassen.net> wrote:
>>>>>
>>>>>> I'm running 1.1.0 built from source on Centos 6. I set up two nodes
>>>>>> on localhost following the instructions on the basho site for testing and
>>>>>> everything was going fine until i needed to fetch all items in a bucket.
>>>>>>
>>>>>> For this i was using node.js with riak-js and it's db.getAll
>>>>>> call,which from the debug logs looks like it uses /mapred
>>>>>>
>>>>>> The bucket only has about 1200 items in it, each a json payload of
>>>>>> about 3KB. Initially, it'd keep getting inconsistent results, sometimes the
>>>>>> data would come back, sometimes data would be undefined. So i kept running
>>>>>> it, trying to see if there was a pattern, adding more debugging code, etc.
>>>>>> But still it would be about a request every couple of seconds. Then it just
>>>>>> stopped working altogether and now i always get:
>>>>>>
>>>>>> { [Error: HTTP error 500:
>>>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object,<<\"tweets_15368536\">>,<<\"178528155714330620\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\"Links\">>]],[],[],[],[],[],[],[],[[<<\"content-type\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\"X-Riak-VTag\">>,49,76,115,54,70,75,74,71,119,65,111,70,88,73,117,84,98,88,79,69,52,116]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1331,487523,218968}]],[],[[<<\"X-Riak-Meta\">>]]}}},<<\"{\"coordinates\":null,\"create...\">>}],...},...}","type":"forward_preflist","stack":"[]"}]
>>>>>> statusCode: 500 }
>>>>>>
>>>>>> in the console. I haven't tried restarting riak, just because i'd
>>>>>> rather find out what's happening than it going away and coming back later
>>>>>> when i don't expect it.
>>>>>>
>>>>>> Any suggestions on how to troubleshoot this? I can still access items
>>>>>> individually and i can even list the keys via ?keys=true, it's just he map
>>>>>> reduce that seems to be busted.
>>>>>>
>>>>>> cheers,
>>>>>> arne
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jon Meredith
>>>>> Platform Engineering Manager
>>>>> Basho Technologies, Inc.
>>>>> jmeredith at basho.com
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Jon Meredith
> Platform Engineering Manager
> Basho Technologies, Inc.
> jmeredith at basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120402/debde47f/attachment.html>


More information about the riak-users mailing list