riak_core question when a node dies
jon at jbrisbin.com
Wed Mar 28 13:09:14 EDT 2012
I'm using get_primary_apl to get my preflist but the problem is how to handle a failure of trying to dispatch to a node that is just now going down and hasn't had time to notify the caller yet. I don't want to loose the web request currently in progress. Maybe I need to get a list of indexes to possibly dispatch to and iterate over them, stopping at the first one that doesn't blow up.
Sent from my iPhone
On Mar 28, 2012, at 12:00 PM, Sean Cribbs <sean at basho.com> wrote:
> Generally I would use the riak_core_apl module to calculate the preflist for your request. It takes into account node visibility and service availability. Use riak_core_node_watcher:service_up to announce that your app is available after registering with riak_core.
> When doing some "split brain" testing/simulation for gen_leader we would do something like the following on a node we wanted to partition:
> 1> erlang:set_cookie(node(), riak2).
> 2> erlang:disconnect_node('dev3 at 127.0.0.1'), erlang:disconnect_node('dev4 at 127.0.0.1').
> Basically, set the cookie so it can't connect to the other nodes, then manually disconnect. That might help you simulate node-outage.
> On Wed, Mar 28, 2012 at 12:49 PM, Jon Brisbin <jon at jbrisbin.com> wrote:
> I'm testing the example code that dispatches a web request from misultin into a riak_core ring of vnodes. It works fantastic when all nodes are up! :)
> Doing "ab -k -c 200 -n 10000 http://localhost:3000/" yields a none-to-shabby performance (dispatching at random into all available vnodes on two separate riak_core processes):
> Concurrency Level: 200
> Time taken for tests: 1.446 seconds
> Complete requests: 10000
> Failed requests: 0
> Write errors: 0
> Keep-Alive requests: 10000
> Total transferred: 1600480 bytes
> HTML transferred: 120036 bytes
> Requests per second: 6914.04 [#/sec] (mean)
> Time per request: 28.927 [ms] (mean)
> Time per request: 0.145 [ms] (mean, across all concurrent requests)
> Transfer rate: 1080.64 [Kbytes/sec] received
> Connection Times (ms)
> min mean[+/-sd] median max
> Connect: 0 0 1.0 0 12
> Processing: 4 28 9.8 27 78
> Waiting: 4 28 9.8 27 78
> Total: 4 28 10.1 27 83
> Percentage of the requests served within a certain time (ms)
> 50% 27
> 66% 31
> 75% 34
> 80% 36
> 90% 41
> 95% 47
> 98% 53
> 99% 58
> 100% 83 (longest request)
> If I were really zealous, I'd set up haproxy to load balance between these two misultin servers and get double failover.
> I'm trying to catch the situation of going into the console of one of my nodes and hitting "CTL-C" to kill that process. I'm not sure what the best way is to handle this. Check before I dispatch to make sure the node is up? Keep a watch of some other kind that, when it sees that node go down and if it's trying to dispatch to that node, it tries to find another one?
> Essentially, I'm trying to prevent misultin from completely bailing on the request because the sync_spawn_command blows up trying to do a gen_server:call to a non-existent node. I'd like to retry to dispatch to a different node if one happens to have crashed while I'm serving requests (I don't want to loose a request, essentially).
> Jon Brisbin
> riak-users mailing list
> riak-users at lists.basho.com
> Sean Cribbs <sean at basho.com>
> Software Engineer
> Basho Technologies, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users