How to cold (re)boot a cluster with already existing node data

DeadZen deadzen at deadzen.com
Mon Jun 6 17:10:09 EDT 2016


I wasn't referring to a cluster replace. node name/reip change can be
done on all offline nodes before starting them.
They still have a cluster if you dont delete the ring data.
Having done that you actually deleted the cluster, (but not the data) when
all that occurred was an ip address change such that the node its looking
for in the dict. cant be found.

orddict,fetch,['riak at 10.44.2.8'...
Youd want a reip of the old riak at 10.xx to riak at 10.44.2.8 iirc

then the node would boot as if nothing happened.

personally I think this state could be detected and a friendlier, I cant
find x have you recently transferred to a new node or ip? etc etc


On Monday, June 6, 2016, Sargun Dhillon <sargun at sargun.me> wrote:

> Two suggestions:
> 1. Use Riak-EE, and have two rings. When you do an update, copy over one
> ring to the other side after you do a "cold reboot"
> 2. Use the Riak Mesos Framework. Mesos is like K8s, but it has stateful
> storage primitives. (Link: https://github.com/basho-labs/riak-mesos)
>
> On Mon, Jun 6, 2016 at 10:37 AM, Jan-Philip Loos <maxdaten at gmail.com
> <javascript:_e(%7B%7D,'cvml','maxdaten at gmail.com');>> wrote:
>
>>
>>
>> On Mon, 6 Jun 2016 at 16:52 Alex Moore <amoore at basho.com
>> <javascript:_e(%7B%7D,'cvml','amoore at basho.com');>> wrote:
>>
>>> Hi Jan,
>>>
>>> When you update the Kubernates nodes, do you have to do them all at once
>>> or can they be done in a rolling fashion (one after another)?
>>>
>>
>> Thnaks for your reply,
>>
>> sadly this is not possible. Kubernetes with GKE just tears all nodes
>> down, creating new nodes with new kubernets version and reschedule all
>> services on these nodes. So after an upgrade, all riak nodes are
>> stand-alone (when starting after deleting /var/lib/riak/ring)
>>
>> Greetings
>>
>> Jan
>>
>>
>>> If you can do them rolling-wise, you should be able to:
>>>
>>> For each node, one at a time:
>>> 1. Shut down Riak
>>> 2. Shutdown/restart/upgrade Kubernates
>>> 3. Start Riak
>>> 4. Use `riak-admin force-replace` to rename the old node name to the new
>>> node name
>>> 5. Repeat on remaining nodes.
>>>
>>> This is covered in "Renaming Multi-node clusters
>>> <http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/changing-cluster-info/#rename-multi-node-clusters>"
>>> doc.
>>>
>>> As for your current predicament,  have you created any new
>>> buckets/changed bucket props in the default namespace since you restarted?
>>> Or have you only done regular operations since?
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>> On Mon, Jun 6, 2016 at 5:25 AM Jan-Philip Loos <maxdaten at gmail.com
>>> <javascript:_e(%7B%7D,'cvml','maxdaten at gmail.com');>> wrote:
>>>
>>>> Hi,
>>>>
>>>> we are using riak in a kuberentes cluster (on GKE). Sometimes it's
>>>> necessary to reboot the complete cluster to update the kubernetes-nodes.
>>>> This results in a complete shutdown of the riak cluster and the riak-nodes
>>>> are rescheduled with a new IP. So how can I handle this situation? How can
>>>> I form a new riak cluster out of the old nodes with new names?
>>>>
>>>> The /var/lib/riak directory is persisted. I had to delete the
>>>> /var/lib/riak/ring folder otherwise "riak start" crashed with this message
>>>> (but saved the old ring state in a tar):
>>>>
>>>> {"Kernel pid
>>>>> terminated",application_controller,"{application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>>>>> riak at 10.44.2.8 <javascript:_e(%7B%7D,'cvml','riak at 10.44.2.8');>
>>>>> ',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_broadcast,init_peers,1,[{file,\"src/riak_core_broadcast.erl\"},{line,616}]},{riak_core_broadcast,start_link,0,[{file,\"src/riak_core_broadcast.erl\"},{line,116}]},{supervisor,do_start_child,2,[{file,\"supervisor.erl\"},{line,310}]},{supervisor,start_children,3,[{file,\"supervisor.erl\"},{line,293}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,259}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}}},{riak_core_app,start,[normal,[]]}}}"}
>>>>> Crash dump was written to: /var/log/riak/erl_crash.dump
>>>>> Kernel pid terminated (application_controller)
>>>>> ({application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>>>>> riak at 10.44.2.8 <javascript:_e(%7B%7D,'cvml','riak at 10.44.2.8');>',
>>>>
>>>>
>>>> The I formed a new cluster via join & plan & commit.
>>>>
>>>> But now, I discovered a problems with incomplete and inconsistent
>>>> partitions:
>>>>
>>>> *$ *curl -Ss "
>>>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
>>>> | jq '.[] | length'
>>>>
>>>> 3064
>>>>
>>>> *$* curl -Ss "
>>>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
>>>> | jq '.[] | length'
>>>>
>>>> 2987
>>>>
>>>> *$* curl -Ss "
>>>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
>>>> | jq '.[] | length'
>>>>
>>>> 705
>>>>
>>>> *$* curl -Ss "
>>>> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
>>>> | jq '.[] | length'
>>>> 3064
>>>>
>>>> Is there a way to fix this? I guess this is caused by the missing old
>>>> ring-state?
>>>>
>>>> Greetings
>>>>
>>>> Jan
>>>>
>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> <javascript:_e(%7B%7D,'cvml','riak-users at lists.basho.com');>
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> <javascript:_e(%7B%7D,'cvml','riak-users at lists.basho.com');>
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160606/c04332b1/attachment-0002.html>


More information about the riak-users mailing list