How to cold (re)boot a cluster with already existing node data

Alex Moore amoore at basho.com
Mon Jun 6 10:52:38 EDT 2016


Hi Jan,

When you update the Kubernates nodes, do you have to do them all at once or
can they be done in a rolling fashion (one after another)?

If you can do them rolling-wise, you should be able to:

For each node, one at a time:
1. Shut down Riak
2. Shutdown/restart/upgrade Kubernates
3. Start Riak
4. Use `riak-admin force-replace` to rename the old node name to the new
node name
5. Repeat on remaining nodes.

This is covered in "Renaming Multi-node clusters
<http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/changing-cluster-info/#rename-multi-node-clusters>"
doc.

As for your current predicament,  have you created any new buckets/changed
bucket props in the default namespace since you restarted? Or have you only
done regular operations since?

Thanks,
Alex


On Mon, Jun 6, 2016 at 5:25 AM Jan-Philip Loos <maxdaten at gmail.com> wrote:

> Hi,
>
> we are using riak in a kuberentes cluster (on GKE). Sometimes it's
> necessary to reboot the complete cluster to update the kubernetes-nodes.
> This results in a complete shutdown of the riak cluster and the riak-nodes
> are rescheduled with a new IP. So how can I handle this situation? How can
> I form a new riak cluster out of the old nodes with new names?
>
> The /var/lib/riak directory is persisted. I had to delete the
> /var/lib/riak/ring folder otherwise "riak start" crashed with this message
> (but saved the old ring state in a tar):
>
> {"Kernel pid
>> terminated",application_controller,"{application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>> riak at 10.44.2.8
>> ',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_broadcast,init_peers,1,[{file,\"src/riak_core_broadcast.erl\"},{line,616}]},{riak_core_broadcast,start_link,0,[{file,\"src/riak_core_broadcast.erl\"},{line,116}]},{supervisor,do_start_child,2,[{file,\"supervisor.erl\"},{line,310}]},{supervisor,start_children,3,[{file,\"supervisor.erl\"},{line,293}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,259}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}}},{riak_core_app,start,[normal,[]]}}}"}
>> Crash dump was written to: /var/log/riak/erl_crash.dump
>> Kernel pid terminated (application_controller)
>> ({application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>> riak at 10.44.2.8',
>
>
> The I formed a new cluster via join & plan & commit.
>
> But now, I discovered a problems with incomplete and inconsistent
> partitions:
>
> *$ *curl -Ss "
> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
> | jq '.[] | length'
>
> 3064
>
> *$* curl -Ss "
> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
> | jq '.[] | length'
>
> 2987
>
> *$* curl -Ss "
> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
> | jq '.[] | length'
>
> 705
>
> *$* curl -Ss "
> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
> | jq '.[] | length'
> 3064
>
> Is there a way to fix this? I guess this is caused by the missing old
> ring-state?
>
> Greetings
>
> Jan
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160606/99ffc0f9/attachment-0002.html>


More information about the riak-users mailing list