X-Riak-Vclock, unexplained errors

David Smith dizzyd at basho.com
Sun Jan 10 08:41:22 EST 2010

On Sat, Jan 9, 2010 at 5:52 PM, Ken Sedgwick <ksedgwic at bonsai.com> wrote:
> I'm having a little trouble interpreting these errors.  Here is a sample:
> =ERROR REPORT==== 9-Jan-2010::16:34:42 ===
> webmachine error: path="/raw/pudd/c11b000000000000"
> [{webmachine_decision_core,'-decision/1-lc$^1/1-1-',
>     [{error,
>          {error,
>              {case_clause,{error,timeout}},
>              [{raw_http_resource,content_types_provided,2},
>               {webmachine_resource,resource_call,3},
>               {webmachine_resource,do,3},
>               {webmachine_decision_core,resource_call,1},
>               {webmachine_decision_core,decision,1},
>               {webmachine_decision_core,handle_request,2},
>               {webmachine_mochiweb,loop,1},
>               {mochiweb_http,headers,5}]}}]},
>  {webmachine_decision_core,decision,1},
>  {webmachine_decision_core,handle_request,2},
>  {webmachine_mochiweb,loop,1},
>  {mochiweb_http,headers,5},
>  {proc_lib,init_p_do_apply,3}]
> Is this a timeout?  Any ideas what I should be looking for?

Yes, this is a timeout -- the raw_http_resource probably should be returning
a 504 (Gateway Timeout) for this error. As best I can tell, this is the
local riak client process timing out on a request -- it doesn't hear back
from enough/any servers in the specified amount of time. How much load are
you putting on the system and what backend are you using? DETS can get
bogged down if you have a write heavy load.

One of the things I'd like to do soon is put some timers on the vnode
operations to help us identify when a backend is taking a long time to
respond. This would help us distinguish between a slow backend and some
other problem -- although I'd hazard a guess that 99% of the time when you
see these timeouts, it's a slow backend.

