strange timeout errors

Greg Steffensen greg.steffensen at gmail.com
Fri Mar 11 09:37:50 EST 2011


I'm seeing Riak timeout consistently after 60 seconds when doing gets and
sets on particular keys (I've only tested in the REST interface).  The
timeout happens inside Riak, not inside our HTTP client.  It happens
regardless of whether the key already exists, and if writing, regardless of
what the value is- it just depends on the key.  N is 3, and it happens
regardless of what R and W are.  There also appear to be some patterns in
how likely the errors are to occur when given random keys of various
lengths.  Here's the relationship of key length in random keys to timeout
likelihood in get requests- I've repeated the experiment several times and
the results, though not identical, have always been within a percentage
point of the values below, because most of the same keys timeout on each
run.

1:   10%
2:   13%
3:   17%
4:   13%
5:   17%
6:   20%
7:   12%
8:   17%
9:   14%
10:  13%
15:  20%
20:  23%
25:  12%
26:  12%
27:  17%
28:  15%
29:  19%
30:  24%
31:    9%
32:   8%
33:   8%
34:   9%
35:   9%
36:   8%

That was done on a ring using the default configuration with 13 physical
nodes that is experiencing lots of simultaneous write activity, and on which
one physical node has been down for days, but it still nominally a member of
the ring (I'm not sure whether this behavior was occurring before the
missing node went down).

FWIW, I've tested write behavior like this with many clusters, and while
certainly most of them have behaved normally, this isn't the first time I've
seen this behavior.  Has anyone else seen anything like this before?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110311/b7326d11/attachment.html>


More information about the riak-users mailing list