How to cold (re)boot a cluster with already existing node data

Sargun Dhillon sargun at sargun.me
Mon Jun 6 16:04:21 EDT 2016


Two suggestions:
1. Use Riak-EE, and have two rings. When you do an update, copy over one
ring to the other side after you do a "cold reboot"
2. Use the Riak Mesos Framework. Mesos is like K8s, but it has stateful
storage primitives. (Link: https://github.com/basho-labs/riak-mesos)

On Mon, Jun 6, 2016 at 10:37 AM, Jan-Philip Loos <maxdaten at gmail.com> wrote:

>
>
> On Mon, 6 Jun 2016 at 16:52 Alex Moore <amoore at basho.com> wrote:
>
>> Hi Jan,
>>
>> When you update the Kubernates nodes, do you have to do them all at once
>> or can they be done in a rolling fashion (one after another)?
>>
>
> Thnaks for your reply,
>
> sadly this is not possible. Kubernetes with GKE just tears all nodes down,
> creating new nodes with new kubernets version and reschedule all services
> on these nodes. So after an upgrade, all riak nodes are stand-alone (when
> starting after deleting /var/lib/riak/ring)
>
> Greetings
>
> Jan
>
>
>> If you can do them rolling-wise, you should be able to:
>>
>> For each node, one at a time:
>> 1. Shut down Riak
>> 2. Shutdown/restart/upgrade Kubernates
>> 3. Start Riak
>> 4. Use `riak-admin force-replace` to rename the old node name to the new
>> node name
>> 5. Repeat on remaining nodes.
>>
>> This is covered in "Renaming Multi-node clusters
>> <http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/changing-cluster-info/#rename-multi-node-clusters>"
>> doc.
>>
>> As for your current predicament,  have you created any new
>> buckets/changed bucket props in the default namespace since you restarted?
>> Or have you only done regular operations since?
>>
>> Thanks,
>> Alex
>>
>>
>> On Mon, Jun 6, 2016 at 5:25 AM Jan-Philip Loos <maxdaten at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> we are using riak in a kuberentes cluster (on GKE). Sometimes it's
>>> necessary to reboot the complete cluster to update the kubernetes-nodes.
>>> This results in a complete shutdown of the riak cluster and the riak-nodes
>>> are rescheduled with a new IP. So how can I handle this situation? How can
>>> I form a new riak cluster out of the old nodes with new names?
>>>
>>> The /var/lib/riak directory is persisted. I had to delete the
>>> /var/lib/riak/ring folder otherwise "riak start" crashed with this message
>>> (but saved the old ring state in a tar):
>>>
>>> {"Kernel pid
>>>> terminated",application_controller,"{application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>>>> riak at 10.44.2.8
>>>> ',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_broadcast,init_peers,1,[{file,\"src/riak_core_broadcast.erl\"},{line,616}]},{riak_core_broadcast,start_link,0,[{file,\"src/riak_core_broadcast.erl\"},{line,116}]},{supervisor,do_start_child,2,[{file,\"supervisor.erl\"},{line,310}]},{supervisor,start_children,3,[{file,\"supervisor.erl\"},{line,293}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,259}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}}},{riak_core_app,start,[normal,[]]}}}"}
>>>> Crash dump was written to: /var/log/riak/erl_crash.dump
>>>> Kernel pid terminated (application_controller)
>>>> ({application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>>>> riak at 10.44.2.8',
>>>
>>>
>>> The I formed a new cluster via join & plan & commit.
>>>
>>> But now, I discovered a problems with incomplete and inconsistent
>>> partitions:
>>>
>>> *$ *curl -Ss "
>>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
>>> | jq '.[] | length'
>>>
>>> 3064
>>>
>>> *$* curl -Ss "
>>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
>>> | jq '.[] | length'
>>>
>>> 2987
>>>
>>> *$* curl -Ss "
>>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
>>> | jq '.[] | length'
>>>
>>> 705
>>>
>>> *$* curl -Ss "
>>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
>>> | jq '.[] | length'
>>> 3064
>>>
>>> Is there a way to fix this? I guess this is caused by the missing old
>>> ring-state?
>>>
>>> Greetings
>>>
>>> Jan
>>>
>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160606/2b6bd011/attachment-0002.html>


More information about the riak-users mailing list