riak_core question when a node dies
sam at lenary.co.uk
Wed Mar 28 17:19:50 EDT 2012
How idempotent are the requests? You could do all three in parallel,
and then use the results of the one that returns first (and hence
doesn't blow up).
On Wed, Mar 28, 2012 at 7:09 PM, Jon Brisbin <jon at jbrisbin.com> wrote:
> I'm using get_primary_apl to get my preflist but the problem is how to
> handle a failure of trying to dispatch to a node that is just now going down
> and hasn't had time to notify the caller yet. I don't want to loose the web
> request currently in progress. Maybe I need to get a list of indexes to
> possibly dispatch to and iterate over them, stopping at the first one that
> doesn't blow up.
> Sent from my iPhone
> On Mar 28, 2012, at 12:00 PM, Sean Cribbs <sean at basho.com> wrote:
> Generally I would use the riak_core_apl module to calculate the preflist for
> your request. It takes into account node visibility and service
> availability. Use riak_core_node_watcher:service_up to announce that your
> app is available after registering with riak_core.
> When doing some "split brain" testing/simulation for gen_leader we would do
> something like the following on a node we wanted to partition:
> 1> erlang:set_cookie(node(), riak2).
> 2> erlang:disconnect_node('dev3 at 127.0.0.1'),
> erlang:disconnect_node('dev4 at 127.0.0.1').
> Basically, set the cookie so it can't connect to the other nodes, then
> manually disconnect. That might help you simulate node-outage.
> On Wed, Mar 28, 2012 at 12:49 PM, Jon Brisbin <jon at jbrisbin.com> wrote:
>> I'm testing the example code that dispatches a web request from misultin
>> into a riak_core ring of vnodes. It works fantastic when all nodes are up!
>> Doing "ab -k -c 200 -n 10000 http://localhost:3000/" yields a
>> none-to-shabby performance (dispatching at random into all available vnodes
>> on two separate riak_core processes):
>> Concurrency Level: 200
>> Time taken for tests: 1.446 seconds
>> Complete requests: 10000
>> Failed requests: 0
>> Write errors: 0
>> Keep-Alive requests: 10000
>> Total transferred: 1600480 bytes
>> HTML transferred: 120036 bytes
>> Requests per second: 6914.04 [#/sec] (mean)
>> Time per request: 28.927 [ms] (mean)
>> Time per request: 0.145 [ms] (mean, across all concurrent requests)
>> Transfer rate: 1080.64 [Kbytes/sec] received
>> Connection Times (ms)
>> min mean[+/-sd] median max
>> Connect: 0 0 1.0 0 12
>> Processing: 4 28 9.8 27 78
>> Waiting: 4 28 9.8 27 78
>> Total: 4 28 10.1 27 83
>> Percentage of the requests served within a certain time (ms)
>> 50% 27
>> 66% 31
>> 75% 34
>> 80% 36
>> 90% 41
>> 95% 47
>> 98% 53
>> 99% 58
>> 100% 83 (longest request)
>> If I were really zealous, I'd set up haproxy to load balance between these
>> two misultin servers and get double failover.
>> I'm trying to catch the situation of going into the console of one of my
>> nodes and hitting "CTL-C" to kill that process. I'm not sure what the best
>> way is to handle this. Check before I dispatch to make sure the node is up?
>> Keep a watch of some other kind that, when it sees that node go down and if
>> it's trying to dispatch to that node, it tries to find another one?
>> Essentially, I'm trying to prevent misultin from completely bailing on the
>> request because the sync_spawn_command blows up trying to do a
>> gen_server:call to a non-existent node. I'd like to retry to dispatch to a
>> different node if one happens to have crashed while I'm serving requests (I
>> don't want to loose a request, essentially).
>> Jon Brisbin
>> riak-users mailing list
>> riak-users at lists.basho.com
> Sean Cribbs <sean at basho.com>
> Software Engineer
> Basho Technologies, Inc.
> riak-users mailing list
> riak-users at lists.basho.com
sam at lenary.co.uk
+44 (0)7891 993 664
More information about the riak-users