occasional timeout when deleting key on multi-node Riak 1.4

Igor Senderovich isenderovich at esperdyne.com
Thu Oct 2 13:19:07 EDT 2014


Indeed, in app.config, I have:

{riak_kv, [ ...  {delete_mode, "immediate"} ]}

Why does this lead to these kinds of random errors, I am curious.
Thank you for your help


On Thu, Oct 2, 2014 at 1:09 PM, Russell Brown <russell.brown at me.com> wrote:

>
> On 2 Oct 2014, at 17:59, Igor Senderovich <isenderovich at esperdyne.com>
> wrote:
>
> There are no other errors in any of the logs at exactly the same time but
> there are periodic errors in error.log and console.log of the following
> form (and these occurred seconds before and after the crash):
>
>
> ** Reason for termination =
> **
> {{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,299}]},{gen_fsm,handle_m
>
> sg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
> 2014-10-02 12:07:57 =CRASH REPORT====
>   crasher:
>     initial call: poolboy:init/1
>     pid: <0.30125.18>
>     registered_name: []
>     exception exit:
> {{{case_clause,"immediate"},[{riak_kv_vnode,do_delete,3,[{file,"src/riak_kv_vnode.erl"},{line,1321}]},{riak_core_vnode,vnode_command,3,
>
>
> Can I see your config? Looks like you have delete_mode configured with the
> string “immediate” rather than the atom ‘immediate’.
>
>
> Cheers
>
> Russell
>
>
>
> On Thu, Oct 2, 2014 at 12:20 PM, Dmitri Zagidulin <dzagidulin at basho.com>
> wrote:
>
>> Thanks. Are there entries in any of the other logs? (like the crash dump).
>>
>> Can you also provide more info on the nodes themselves. What size AWS
>> instances are you running? Is the delete timeout happening while load
>> testing?
>>
>> On Thu, Oct 2, 2014 at 12:11 PM, Igor Senderovich <
>> isenderovich at esperdyne.com> wrote:
>>
>>> Thanks for your help, Dmitri,
>>>
>>> I get the following in error.log:
>>> 2014-10-02 12:05:45.037 [error] <0.6359.19> Webmachine error at path
>>> "/buckets/imc/keys/5134a18660494ea5553d2c90ef9eea2f" : "Service Unavailable"
>>>
>>> And no, there is no load balancer on our cluster.
>>> Thank you
>>>
>>>
>>> On Thu, Oct 2, 2014 at 11:52 AM, Dmitri Zagidulin <dzagidulin at basho.com>
>>> wrote:
>>>
>>>> One other question - are you using a load balancer for your cluster
>>>> (like HAProxy or the like). In which case, take a look at its logs, also.
>>>>
>>>> On Thu, Oct 2, 2014 at 11:51 AM, Dmitri Zagidulin <dzagidulin at basho.com
>>>> > wrote:
>>>>
>>>>> Igor,
>>>>> Can you look in the riak log directory, in the error.log (and console
>>>>> log and crash dump file) to see if there's any entries, around the time of
>>>>> the delete operation? And post them here?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 2, 2014 at 11:45 AM, Igor Senderovich <
>>>>> isenderovich at esperdyne.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I get a timeout when deleting a key, reproducible in about 1 in 10
>>>>>> times:
>>>>>> $ curl -i -vvv
>>>>>> http://myhost:8098/buckets/imc/keys/5134a18660494ea5553d2c90ef9eea2f
>>>>>>
>>>>>> * About to connect() to dp1.prod6.ec2.cmg.net port 8098
>>>>>> *   Trying 10.12.239.90... connected
>>>>>> * Connected to dp1.prod6.ec2.cmg.net (10.12.239.90) port 8098
>>>>>> > DELETE /buckets/imc/keys/5134a18660494ea5553d2c90ef9eea2f HTTP/1.1
>>>>>> > User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5
>>>>>> OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
>>>>>> > Host: dp1.prod6.ec2.cmg.net:8098
>>>>>> > Accept: */*
>>>>>> >
>>>>>> < HTTP/1.1 503 Service Unavailable
>>>>>> HTTP/1.1 503 Service Unavailable
>>>>>> < Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
>>>>>> Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
>>>>>> < Date: Wed, 01 Oct 2014 16:11:41 GMT
>>>>>> Date: Wed, 01 Oct 2014 16:11:41 GMT
>>>>>> < Content-Type: text/plain
>>>>>> Content-Type: text/plain
>>>>>> < Content-Length: 18
>>>>>> Content-Length: 18
>>>>>>
>>>>>> request timed out
>>>>>> * Connection #0 to host dp1.prod6.ec2.cmg.net left intact
>>>>>> * Closing connection #0
>>>>>>
>>>>>>
>>>>>> This is on Riak 1.4 on a 5 node cluster with an n-value of 3.
>>>>>> Thank you for your help
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20141002/413f195b/attachment.html>


More information about the riak-users mailing list