Upgrading 0.14.2 cluster to 1.2

Sebastian Cohnen sebastian.cohnen at gmail.com
Mon Aug 13 06:24:32 EDT 2012


Hey Joe,

the upgrade itself appears to be fine on stage. But I think I might have an issue with the capability negotiation.

Server A thinks, that Server B is a legacy node. Server B thinks the same of Server A.

On Server A (running riak-admin member_status):

(legacy)   50.0%      --      'riak at SERVER_B'
valid      50.0%      --      'riak at SERVER_A'

and on Server B:

valid      50.0%      --      'riak at SERVER_B'
(legacy)   50.0%      --      'riak at SERVER_A'


riak-admin transfers does not work properly too. Each node thinks that the other node is down.

riak-admin ring_status just prints a: "Currently in legacy gossip mode."


Restarting the nodes did not have an effect. Any ideas?


Best

Sebastian


On 09.08.2012, at 23:14, Sebastian Cohnen <sebastian.cohnen at gmail.com> wrote:

> Hey Joe,
> 
> thanks for your detailed description of the problem.
> 
> I already assumed that this is not necessarily an indicator for problems. I just wanted to make sure I'm not missing anything important. ring_status just tells me "Currently in legacy gossip mode.", but member_status looks very informative.
> 
> It's getting a bit too late (I'm on CEST) to continue to work on the migration testing, but I'll continue tomorrow.
> 
> 
> Thanks again for your help!
> 
> Sebastian
> 
> On 09.08.2012, at 22:47, Joseph Blomstedt <joe at basho.com> wrote:
> 
>> Yes, this makes sense unfortunately. 'riak-admin transfers' isn't
>> going to work for you in a mixed 0.14.2 and 1.2 cluster.
>> 
>> Between 0.14.2 and 1.0, the entire cluster system was revamped. One
>> consequence of this change was that 'riak-admin transfers' would only
>> work on the 1.0+ nodes in the cluster, not any of the 0.14.2 nodes. At
>> the time, this wasn't a major issue because you could just use the
>> command on the right nodes and get the information you needed until
>> all nodes were eventually upgraded.
>> 
>> For Riak 1.2, 'riak-admin transfers' has been changed again. This
>> time, in a mixed cluster 'riak-admin transfers' only works on the
>> older nodes, not the Riak 1.2 nodes. For example, in a mixed 1.1 and
>> 1.2 cluster, you can only use riak-admin transfers on the 1.1 nodes
>> until all have been upgraded.
>> 
>> Unfortunately, the combination doesn't work out well for you in this
>> case. Riak 0.14.2 transfers fails if there are any 1.0+ nodes in the
>> cluster, and Riak 1.2 transfers fails if there are any <1.2 nodes in
>> the cluster.  Both are true, and therefore neither versions of Riak
>> can properly give you transfer information.
>> 
>> Of course, the lack of being able to monitor transfers doesn't mean
>> things aren't actually working. Running 'riak-admin member_status' and
>> 'riak-admin ring_status' on the newer nodes should provide enough
>> detail about what's going on to see if your cluster is moving along.
>> 
>> Regards,
>> Joe
>> 
>> 
>> On Thu, Aug 9, 2012 at 1:25 PM, Sebastian Cohnen
>> <sebastian.cohnen at gmail.com> wrote:
>>> I forgot to mention, that I also ran "riak_core_node_watcher:service_up(riak_pipe, self())." on the 0.14.2 node (got that from here: http://wiki.basho.com/Rolling-Upgrades.html)
>>> 
>>> On 09.08.2012, at 22:16, Sebastian Cohnen <sebastian.cohnen at gmail.com> wrote:
>>> 
>>>> Hey all,
>>>> 
>>>> looks like I'm already stuck :-/
>>>> 
>>>> I'm trying to test the upgrade on a stage cluster (with 2 nodes). What I did so far:
>>>> * downloaded 1.2
>>>> * stopped riak
>>>> * backup /var/lib/riak/ring and /etc/riak
>>>> * installed 1.2
>>>> * changed app.config and vm.args (just node name, ring creation size, config for our multi-backends)
>>>> * started riak again
>>>> 
>>>> riak-admin status looked fine, ring membership is fine, both nodes answer requests. As hinted by Jon, I attached to riak console and run riak_core_capability:all(). As far as I can tell, everything looks okay here too.
>>>> 
>>>> What is not working is: riak-admin transfers. It is not working on both nodes. For the state situation this is not a big deal, for production this would be a potential problem.
>>>> 
>>>> I've pasted the output of "riak_core_capability:all()." and command output of riak-admin transfers here: https://gist.github.com/3307714
>>>> 
>>>> Is there anything I can do about that?
>>>> 
>>>> 
>>>> Best
>>>> 
>>>> Sebastian
>>>> 
>>>> 
>>>> PS: What's interesting is that I think that I saw a similar behavior while trying to upgrade to 1.1.4 a few days ago. I have to double check that though.
>>>> 
>>>> On 09.08.2012, at 14:08, Sebastian Cohnen <sebastian.cohnen at gmail.com> wrote:
>>>> 
>>>>> I'm actually thinking about taking the risk. We only have a small 3-node cluster with ~50GB of data with relatively little traffic (and we don't have any 2i, nor do we use search or MR).
>>>>> 
>>>>> I'll backup the data files, the ring state and everything else I find and give it a try. If anything strange happens, we roll back and do the additional 1.1.4 step.
>>>>> 
>>>>> Thanks for the information and  help so far!
>>>>> 
>>>>> On 08.08.2012, at 19:57, Jon Meredith <jmeredith at basho.com> wrote:
>>>>> 
>>>>>> Only test coverage.  We didn't run direct testing to 0.14.2 - we also deliberately made the decision not to remove some older code that would have broken 0.14 upgrades until the next major release.
>>>>>> 
>>>>>> It all depends on your risk tolerance - we didn't make any file format changes to bitcask so your data should be safe.  If you wanted to try it, I would take a backup of the ring directory in case you had to downgrade the node again for any reason.
>>>>>> 
>>>>>> On the newly upgraded node you could run riak_core_capability:all(). on the riak console, that would double-check that the settings matched the required rolling upgrade settings, and make sure you do a diff of your app.config/vm.args against the new package to check there aren't any settings missing.
>>>>>> 
>>>>>> Jon.
>>>>>> 
>>>>>> On Wed, Aug 8, 2012 at 11:39 AM, Sebastian Cohnen <sebastian.cohnen at gmail.com> wrote:
>>>>>> I'm curious, are there any special reasons for your recommendation?
>>>>>> 
>>>>>> On 08.08.2012, at 19:38, Jon Meredith <jmeredith at basho.com> wrote:
>>>>>> 
>>>>>>> I would recommend going 0.14.2 -> 1.1.4 -> 1.2, making sure you follow the pre-1.0 upgrade instructions on http://wiki.basho.com/Rolling-Upgrades.html
>>>>>>> 
>>>>>>> Once you do the upgrade from 1.2, the capabilities system will kick in and the old legacy settings mentioned in the rolling upgrade will no longer be used (if you need to you can override them with the new capability override mechanism).
>>>>>>> 
>>>>>>> Jon.
>>>>>>> 
>>>>>>> On Wed, Aug 8, 2012 at 10:23 AM, Nathan Wilken <wilken at asu.edu> wrote:
>>>>>>> Is an intermediate upgrade recommended?  0.14.2 --> 1.0/1.1 --> 1.2?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: riak-users-bounces at lists.basho.com [riak-users-bounces at lists.basho.com] on behalf of Sean Cribbs [sean at basho.com]
>>>>>>> Sent: Wednesday, August 08, 2012 6:35 AM
>>>>>>> To: Sebastian Cohnen
>>>>>>> Cc: riak-users at lists.basho.com
>>>>>>> Subject: Re: Upgrading 0.14.2 cluster to 1.2
>>>>>>> 
>>>>>>> Sebastian,
>>>>>>> 
>>>>>>> While it might work, we did not specifically test upgrades from 0.14.2, only 1.0 and 1.1.
>>>>>>> 
>>>>>>> On Wed, Aug 8, 2012 at 7:08 AM, Sebastian Cohnen <sebastian.cohnen at gmail.com> wrote:
>>>>>>> Hey list,
>>>>>>> 
>>>>>>> is it a good idea to upgrade a small (3 node) cluster straight to 1.2 from 0.14.2. Especially with riak's 1.2 capabilities negotiation, it feels like the upgrade process should be much simpler now? We don't do any M/R jobs currently and we are only using bitcask right now.
>>>>>>> 
>>>>>>> 
>>>>>>> Best
>>>>>>> 
>>>>>>> Sebastian
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> riak-users mailing list
>>>>>>> riak-users at lists.basho.com
>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Sean Cribbs <sean at basho.com>
>>>>>>> Software Engineer
>>>>>>> Basho Technologies, Inc.
>>>>>>> http://basho.com/
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> riak-users mailing list
>>>>>>> riak-users at lists.basho.com
>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jon Meredith
>>>>>>> Platform Engineering Manager
>>>>>>> Basho Technologies, Inc.
>>>>>>> jmeredith at basho.com
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Jon Meredith
>>>>>> Platform Engineering Manager
>>>>>> Basho Technologies, Inc.
>>>>>> jmeredith at basho.com
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> 
>> -- 
>> Joseph Blomstedt <joe at basho.com>
>> Senior Software Engineer
>> Basho Technologies, Inc.
>> http://www.basho.com/
> 





More information about the riak-users mailing list