replacing node results in error with diag

Max Vernimmen m.vernimmen at comparegroup.eu
Wed Oct 1 02:33:38 EDT 2014


Hi Sargun,

The debug output can be found here: https://gist.github.com/anonymous/7e82fa3a62595fbd2cc7
Indeed your suggested command resolves the problem nicely, that saves me a lot of restarting. Thank for your help!

Best regards,


Max Vernimmen

> -----Original Message-----
> From: Sargun Dhillon [mailto:sargun at sargun.me]
> Sent: dinsdag 30 september 2014 21:57
> To: Max Vernimmen
> Cc: riak-users at lists.basho.com
> Subject: Re: replacing node results in error with diag
> 
> So, I don't have a ton of experience with Riaknostic, but taking a
> casual glance at the source code, it appears that Riaknostic caches
> some node-local data about the ring (see:
> https://github.com/basho/riaknostic/blob/2.0.0/src/riaknostic_node.erl#L192
> -L208).
> You should be able to unset this by attaching to a node "riak attach"
> and running application:unset_env(riaknostic, local_stats). --
> although, it'd be nice to get a dump of your local env first for
> debugging purposes, you can get that via io:format("Local env: ~p~n",
> [application:get_all_env(riaknostic)]). (including the period).
> 
> If that clears one node, you can do it on all of your nodes by issuing
> rpc:multicall(application, unset_env, [riaknostic, local_stats]). on
> one node.
> 
> On Tue, Sep 30, 2014 at 12:39 PM, Max Vernimmen
> <m.vernimmen at comparegroup.eu> wrote:
> > Hi,
> >
> >
> >
> > Today I finished upgrading 2.0.0-pre20 to 2.0.0-1. Once that was done I did
> > a node replace according to the instructions at
> > http://docs.basho.com/riak/latest/ops/running/nodes/replacing/
> >
> > Once the replacing was done, our monitoring notified us about a problem
> with
> > the cluster. Our monitoring does a ‘riak-admin diag’ and each of the nodes
> > is now giving the output I’ve posted here:
> > https://gist.github.com/anonymous/a3133333a07b0cd1da1c
> >
> > There is a node being referenced in the diag, which is the replaced node. It
> > is no longer in the cluster. I confirmed the ring was settled and in the web
> > interface of the cluster the replaced node is no longer listed neither is it
> > in the `riak-admin status` output. Only a restart of the riak service on
> > each of the nodes resolves the problem. Doing a restart on only one node
> > fixes the diag status only for that node.
> >
> >
> >
> > To me it seems like there is some state left in the cluster nodes after a
> > node is replaced, causing the `riak-admin diag` command to fail. Has anyone
> > else seen this? Would this classify as a bug or did I simply do something
> > wrong ? J
> >
> >
> >
> > Best regards,
> >
> >
> >
> >
> >
> > Max Vernimmen
> >
> >
> >
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >


More information about the riak-users mailing list