occasional timeout when deleting key on multi-node Riak 1.4

Igor Senderovich isenderovich at esperdyne.com
Thu Oct 2 12:59:32 EDT 2014


There are no other errors in any of the logs at exactly the same time but
there are periodic errors in error.log and console.log of the following
form (and these occurred seconds before and after the crash):

2014-10-02 12:07:57.623 [error] <0.30125.18> CRASH REPORT Process
<0.30125.18> with 10 neighbours exited with reason: no case clause matching
"immediate" in riak_kv_vnode:do_delete/3 line 1321 in gen_f
sm:terminate/7 line 611
2014-10-02 12:07:57.623 [error] <0.30127.18> Supervisor
{<0.30127.18>,poolboy_sup} had child riak_core_vnode_worker started with
riak_core_vnode_worker:start_link([{worker_module,riak_core_vnode_worker
},{worker_args,[228359630832953580969325755111919221821239459840,...]},...])
at undefined exit with reason no case clause matching "immediate" in
riak_kv_vnode:do_delete/3 line 1321 in context shutdown
_error
2014-10-02 12:07:57.623 [error] <0.30127.18> gen_server <0.30127.18>
terminated with reason: no case clause matching "immediate" in
riak_kv_vnode:do_delete/3 line 1321



crash.log contains:

2014-10-02 12:07:57 =ERROR REPORT====
** State machine <0.30125.18> terminating
** Last message in was
{'EXIT',<0.30124.18>,{{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_v
node.erl"},{line,299}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}}
** When State == ready
**      Data  ==
{state,<0.30127.18>,{[<0.30129.18>,<0.30128.18>,<0.30138.18>,<0.30137.18>],[<0.30130.18>,<0.30131.18>,<0.30132.18>,<0.30134.18>,<0.30135.18>,<0.30136.18>]},{[],[]},295960692,10,0,0}
** Reason for termination =
**
{{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,299}]},{gen_fsm,handle_m
sg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
2014-10-02 12:07:57 =CRASH REPORT====
  crasher:
    initial call: poolboy:init/1
    pid: <0.30125.18>
    registered_name: []
    exception exit:
{{{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,299}]}
,{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,611}]},{proc_lib,init_p_do_
apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
    ancestors:
[<0.30124.18>,<0.30117.18>,riak_core_vnode_sup,riak_core_sup,<0.157.0>]
    messages: []
    links:
[<0.30134.18>,<0.30136.18>,<0.30137.18>,<0.30138.18>,<0.30135.18>,<0.30129.18>,<0.30131.18>,<0.30132.18>,<0.30130.18>,<0.30127.18>,<0.30128.18>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 403
  neighbours:
    neighbour:
[{pid,<0.30128.18>},{registered_name,[]},{initial_call,{riak_core_vnode_worker,init,['Argument__1']}},{current_function,{gen_server,loop,6}},{ancestors,[<0.30127.18>,<0.30125.18>,<0.3012
4.18>,<0.30117.18>,riak_core_vnode_sup,riak_core_sup,<0.157.0>]},{messages,[]},{links,[<0.30127.18>,<0.30125.18>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,610},{stack_size,9},{red
uctions,526}]
    neighbour:
[{pid,<0.30130.18>},{registered_name,[]},{initial_call,{riak_core_vnode_worker,init,['Argument__1']}},{current_function,{gen_server,loop,6}},{ancestors,[<0.30127.18>,<0.30125.18>,<0.3012
4.18>,<0.30117.18>,riak_core_vnode_sup,riak_core_sup,<0.157.0>]},{messages,[]},{links,[<0.30127.18>,<0.30125.18>]},{dictionary,[]},{trap_exit,false},{status,waiting},{heap_size,233},{stack_size,9},{red
uctions,27}]
...
2014-10-02 12:07:57 =SUPERVISOR REPORT====
     Supervisor: {<0.30127.18>,poolboy_sup}
     Context:    shutdown_error
     Reason:
{{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,299}]},{ge
n_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
     Offender:
[{nb_children,10},{name,riak_core_vnode_worker},{mfargs,{riak_core_vnode_worker,start_link,[[{worker_module,riak_core_vnode_worker},{worker_args,[22835963083295358096932575511191922182
1239459840,[],worker_props,<0.30124.18>]},{worker_callback_mod,riak_kv_worker},{size,10},{max_overflow,0}]]}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2014-10-02 12:07:57 =ERROR REPORT====
** Generic server <0.30127.18> terminating
** Last message in was
{'EXIT',<0.30125.18>,{{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_v
node.erl"},{line,299}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}}
** When Server state ==
{state,{<0.30127.18>,poolboy_sup},simple_one_for_one,[{child,undefined,riak_core_vnode_worker,{riak_core_vnode_worker,start_link,[[{worker_module,riak_core_vnode_worker},{worker
_args,[228359630832953580969325755111919221821239459840,[],worker_props,<0.30124.18>]},{worker_callback_mod,riak_kv_worker},{size,10},{max_overflow,0}]]},temporary,5000,worker,[riak_core_vnode_worker]}
],{set,10,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[<0.30131.18>],[<0.30132.18>],[],[<0.30134.18>],[<0.30135.18>],[<0.30136.18>],[<0.30137.18>],[<0.30138.18>],[],[],[],[],[],[<
0.30128.18>],[<0.30129.18>],[<0.30130.18>]}}},0,1,[],poolboy_sup,{riak_core_vnode_worker,[{worker_module,riak_core_vnode_worker},{worker_args,[228359630832953580969325755111919221821239459840,[],worker
_props,<0.30124.18>]},{worker_callback_mod,riak_kv_worker},{size,10},{max_overflow,0}]}}
** Reason for termination ==
**
{{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,299}]},{gen_fsm,handle_m
sg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
2014-10-02 12:07:57 =CRASH REPORT====
  crasher:
    initial call: supervisor:poolboy_sup/1
    pid: <0.30127.18>
    registered_name: []
    exception exit:
{{{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,299}]}
,{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,747}]},{proc_lib,init
_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
    ancestors:
[<0.30125.18>,<0.30124.18>,<0.30117.18>,riak_core_vnode_sup,riak_core_sup,<0.157.0>]
    messages: []
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 4181
    stack_size: 24
    reductions: 1530
  neighbours:


On Thu, Oct 2, 2014 at 12:20 PM, Dmitri Zagidulin <dzagidulin at basho.com>
wrote:

> Thanks. Are there entries in any of the other logs? (like the crash dump).
>
> Can you also provide more info on the nodes themselves. What size AWS
> instances are you running? Is the delete timeout happening while load
> testing?
>
> On Thu, Oct 2, 2014 at 12:11 PM, Igor Senderovich <
> isenderovich at esperdyne.com> wrote:
>
>> Thanks for your help, Dmitri,
>>
>> I get the following in error.log:
>> 2014-10-02 12:05:45.037 [error] <0.6359.19> Webmachine error at path
>> "/buckets/imc/keys/5134a18660494ea5553d2c90ef9eea2f" : "Service Unavailable"
>>
>> And no, there is no load balancer on our cluster.
>> Thank you
>>
>>
>> On Thu, Oct 2, 2014 at 11:52 AM, Dmitri Zagidulin <dzagidulin at basho.com>
>> wrote:
>>
>>> One other question - are you using a load balancer for your cluster
>>> (like HAProxy or the like). In which case, take a look at its logs, also.
>>>
>>> On Thu, Oct 2, 2014 at 11:51 AM, Dmitri Zagidulin <dzagidulin at basho.com>
>>> wrote:
>>>
>>>> Igor,
>>>> Can you look in the riak log directory, in the error.log (and console
>>>> log and crash dump file) to see if there's any entries, around the time of
>>>> the delete operation? And post them here?
>>>>
>>>>
>>>>
>>>> On Thu, Oct 2, 2014 at 11:45 AM, Igor Senderovich <
>>>> isenderovich at esperdyne.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I get a timeout when deleting a key, reproducible in about 1 in 10
>>>>> times:
>>>>> $ curl -i -vvv
>>>>> http://myhost:8098/buckets/imc/keys/5134a18660494ea5553d2c90ef9eea2f
>>>>>
>>>>> * About to connect() to dp1.prod6.ec2.cmg.net port 8098
>>>>> *   Trying 10.12.239.90... connected
>>>>> * Connected to dp1.prod6.ec2.cmg.net (10.12.239.90) port 8098
>>>>> > DELETE /buckets/imc/keys/5134a18660494ea5553d2c90ef9eea2f HTTP/1.1
>>>>> > User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5
>>>>> OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
>>>>> > Host: dp1.prod6.ec2.cmg.net:8098
>>>>> > Accept: */*
>>>>> >
>>>>> < HTTP/1.1 503 Service Unavailable
>>>>> HTTP/1.1 503 Service Unavailable
>>>>> < Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
>>>>> Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
>>>>> < Date: Wed, 01 Oct 2014 16:11:41 GMT
>>>>> Date: Wed, 01 Oct 2014 16:11:41 GMT
>>>>> < Content-Type: text/plain
>>>>> Content-Type: text/plain
>>>>> < Content-Length: 18
>>>>> Content-Length: 18
>>>>>
>>>>> request timed out
>>>>> * Connection #0 to host dp1.prod6.ec2.cmg.net left intact
>>>>> * Closing connection #0
>>>>>
>>>>>
>>>>> This is on Riak 1.4 on a 5 node cluster with an n-value of 3.
>>>>> Thank you for your help
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20141002/b13c12bf/attachment.html>


More information about the riak-users mailing list