Random Block Not Found Issues with Riak/Riak CS

Charles Bijon bijon.charles at gmail.com
Tue Jul 22 07:01:54 EDT 2014


Hi,

We have the same issue there. But we have 45 riak/riak-cs nodes in 
production. Do you have any idea to correct it ?

Regards,

Charles


Le 17/07/2014 23:21, Dave Finster a écrit :
> Hi Kelly
>
> 1.4.5 - Riak CS
> 1.4.8 - Riak
> Anti Entropy is on (all nodes)
>
> Deactivating n_val_1_get_requests still allows me to cause the issue 
> (with less occurrence), however a different error has cropped up now:
>
> 2014-07-17 21:15:38 =ERROR REPORT====
> webmachine error: path="/buckets/<bucket 
> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png"
> {exit,{{{{case_clause,{error,timeout}},[{riak_cs_manifest_fsm,handle_get_manifests,1,[{file,"src/riak_cs_manifest_fsm.erl"},{line,265}]},{riak_cs_manifest_fsm,waiting_command,3,[{file,"src/riak_cs_manifest_fsm.erl"},{line,201}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},{gen_fsm,sync_send_event,[<0.1383.0>,get_manifests,infinity]}},{gen_fsm,sync_send_event,[<0.1382.0>,get_manifest,infinity]}},[{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,ensure_doc,2,[{file,"src/riak_cs_wm_utils.erl"},{line,236}]},{riak_cs_wm_object,authorize,2,[{file,"src/riak_cs_wm_object.erl"},{line,64}]},{riak_cs_wm_common,authorize,2,[{file,"src/riak_cs_wm_common.erl"},{line,396}]},{riak_cs_wm_common,forbidden,2,[{file,"src/riak_cs_wm_common.erl"},{line,182}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]}
> [{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,ensure_doc,2,[{file,"src/riak_cs_wm_utils.erl"},{line,236}]},{riak_cs_wm_object,authorize,2,[{file,"src/riak_cs_wm_object.erl"},{line,64}]},{riak_cs_wm_common,authorize,2,[{file,"src/riak_cs_wm_common.erl"},{line,396}]},{riak_cs_wm_common,forbidden,2,[{file,"src/riak_cs_wm_common.erl"},{line,182}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}]
>
> Thanks,
> Dave
>
>> On 18 Jul 2014, at 2:19 am, Kelly McLaughlin <kelly at basho.com 
>> <mailto:kelly at basho.com>> wrote:
>>
>> Dave,
>>
>> Can you tell me what versions of Riak and Riak CS you have installed? 
>> Do you have AAE enabled or disabled? It's tough to come up with an 
>> explanation without more information, but I would try setting 
>> n_val_1_get_requests to false and see if you continue to experience 
>> the problem. My guess is that will resolve the issue, but let me know 
>> what happens.
>>
>> Kelly
>>
>> On July 17, 2014 at 1:00:19 AM, Dave Finster (davefinster at icloud.com 
>> <mailto:davefinster at icloud.com>) wrote:
>>
>>> Hi Everyone
>>>
>>> Spent a bit of time trying to debug this one and not sure were to 
>>> from here. The use case that appears to cause this breakage is a web 
>>> page that links to 8 x 10MB images and it attempts to fetch them 
>>> simultaneously.
>>>
>>> Occasionally, one or two of the images will just fail to load, while 
>>> other times they all work file. I've tracked it down to the crash 
>>> below. It isn't always the same image. To make the problem more 
>>> repeatable, I forced our load balancer into only using a single 
>>> Riak-CS node, so it will be getting hit with all the requests. We 
>>> are using HAProxy out the front and are running SmartOS 64-bit 
>>> images across the board.
>>>
>>> arekinath helped me look into it and one thought was that I was hit 
>>> by the AAE bug prior to 1.4.8, but even clearing the AAE made no 
>>> difference. The n-val on the buckets is 3 and its a 4-node cluster. 
>>> All 4 nodes have both a Riak and a Riak-CS node on it. I also have 
>>> pb_backlog turned up to 256, n_val_1_get_requests set to true and 
>>> fold_objects_for_list_keys set to true. 'ring-status' shows that the 
>>> whole ring is reachable.
>>>
>>> Any idea on how to diagnose this one further?
>>>
>>> 2014-07-17 06:38:54 =CRASH REPORT====
>>> crasher:
>>> initial call: mochiweb_acceptor:init/3
>>> pid: <0.26119.1>
>>> registered_name: []
>>> exception exit: 
>>> {{normal,{gen_fsm,sync_send_event,[<0.27617.1>,get_next_chunk,infinity]}},[{gen_fsm,sync_send_event,3,[{file,"gen_fsm.erl"},{line,214}]},{riak_cs_wm_utils,streaming_get,4,[{file,"src/riak_cs_wm_utils.erl"},{line,272}]},{webmachine_decision_core,'-make_encoder_stream/3-fun-0-',3,[{file,"src/webmachine_decision_core.erl"},{line,667}]},{webmachine_request,send_stream_body_no_chunk,2,[{file,"src/webmachine_request.erl"},{line,334}]},{webmachine_request,send_response,3,[{file,"src/webmachine_request.erl"},{line,398}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,251}]},{webmachine_decision_core,wrcall,1,[{file,"src/webmachine_decision_core.erl"},{line,42}]},{webmachine_decision_core,finish_response,3,[{file,"src/webmachine_decision_core.erl"},{line,92}]}]}
>>> ancestors: [object_web_mochiweb,riak_cs_sup,<0.143.0>]
>>> messages: []
>>> links: [<0.298.0>,#Port<0.12015>]
>>> dictionary: 
>>> [{reqstate,{wm_reqstate,#Port<0.12015>,[{'content-encoding',"identity"},{'content-type',"application/octet-stream"},{resource_module,riak_cs_wm_object}],undefined,"10.4.242.1",{wm_reqdata,'GET',http,{1,1},"10.4.242.1",undefined,[],"/buckets/<the 
>>> bucket 
>>> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png","/buckets/<the 
>>> bucket 
>>> name>/objects/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png?Signature=U0By3mIwaRIVBHNcYhSt6r5QgPk%3D&Expires=1405580057&AWSAccessKeyId=DGTXHHWIEDF4XUBSBYVI",[{bucket,"<the 
>>> bucket 
>>> name>"},{object,"bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f%2F847c340cfe2f44028d6fd5606f696796%2FAttachment-1.png"}],[],"../../../..",{200,undefined},1073741824,67108864,[{"_ga","GA1.3.643660316.1404789703"}],[{"Signature","U0By3mIwaRIVBHNcYhSt6r5QgPk="},{"Expires","1405580057"},{"AWSAccessKeyId","DGTXHHWIEDF4XUBSBYVI"}],{9,{"cookie",{'Cookie',"_ga=GA1.3.643660316.1404789703"},{"accept-language",{'Accept-Language',"en-US,en;q=0.8"},{"accept-encoding",{'Accept-Encoding',"gzip,deflate,sdch"},{"accept",{'Accept',"image/webp,*/*;q=0.8"},nil,nil},nil},{"connection",{'Connection',"keep-alive"},nil,nil}},{"referer",{'Referer',"<the 
>>> referrer>"},{"host",{'Host',"<our riak-cs host 
>>> name>"},nil,nil},{"user-agent",{'User-Agent',"Mozilla/5.0 
>>> (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like 
>>> Gecko) Chrome/35.0.1916.153 
>>> Safari/537.36"},nil,{"x-rcs-rewrite-path",{"x-rcs-rewrite-path","/<the 
>>> bucket 
>>> name>/bf15f98c-eaa1-4ff9-83ff-24c1e7e1380f/847c340cfe2f44028d6fd5606f696796/Attachment-1.png?AWSAccessKeyId=DGTXHHWIEDF4XUBSBYVI&Expires=1405580057&Signature=U0By3mIwaRIVBHNcYhSt6r5QgPk%3D"},nil,nil}}}}},not_fetched_yet,false,{3,{"content-type",{"Content-Type","application/octet-stream"},nil,{"etag",{"ETag","\"a3a32cf5d8f502d7e8d35fd8412a6878\""},nil,
>>> trap_exit: false
>>> status: running
>>> heap_size: 28657
>>> stack_size: 24
>>> reductions: 80773
>>> neighbours:
>>>
>>> Thanks,
>>> Dave Finster
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140722/922cf2a3/attachment.html>


More information about the riak-users mailing list