forced timeout in riak_client:get/3

Tuncer Ayaz tuncer.ayaz at gmail.com
Tue Nov 3 18:39:11 EST 2009


On Sun, Nov 1, 2009 at 4:01 PM, Tuncer Ayaz <tuncer.ayaz at gmail.com> wrote:
> I've been testing a 2 or 3 riak nodes cluster with the following setup:
> debug-fresh riak0_config riak0 at 127.0.0.1
> debug-join  riak1_config riak0 at 127.0.0.1
> debug-join  riak2_config riak0 at 127.0.0.1
>
> All configs use the gb_trees backend.
>
> They all have unique doorbell ports and are all unique riak-0.6 trees
> to be sure that there's no data dir conflicts.
> I've chosen not to use hg tip as there seem to be no changes
> to riak_get_fsm.erl which would possibly be a fix to the
> issue I run into.
>
> The test:
> (1) knowing that all nodes are up I put/2 all test data with W=N
> (2) run tests that get/3 with R=1 where each get/3 responds within ~60ms
> (3) stop riak0
> (4) re-run tests. it works correctly with get/3 R=1 but I run into the default
>     timeout of 15 seconds.
> (5) debug-restart riak0_config riak0 at 127.0.0.1
> (6) rerun tests with riak0 back online and it again responds within ~60ms
>
> The decision which of the 3 or 2 nodes to connect to is done with
> a client-side availability check. From that list of online nodes
> I do riak_client:connect/1 to a random online node.
>
> Any idea what's going wrong?

I re-tried on a different server and stopping any of the 3 nodes
causes the timeout when calling get/3(R=1) on one of the 2
remaining nodes. Node1 is not special compared to the other 2
nodes which joined node1. That's good as all 3 nodes are meant
to be equal.




More information about the riak-users mailing list