How to cold (re)boot a cluster with already existing node data

Jan-Philip Loos maxdaten at gmail.com
Sun Jun 5 22:18:53 EDT 2016


Hi,

we are using riak in a kuberentes cluster (on GKE). Sometimes it's
necessary to reboot the complete cluster to update the kubernetes-nodes.
This results in a complete shutdown of the riak cluster and the riak-nodes
are rescheduled with a new IP. So how can I handle this situation? How can
I form a new riak cluster out of the old nodes with new names?

The /var/lib/riak directory is persisted. I had to delete the
/var/lib/riak/ring folder otherwise "riak start" crashed with this message
(but saved the old ring state in a tar):

{"Kernel pid
> terminated",application_controller,"{application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
> riak at 10.44.2.8
> ',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_broadcast,init_peers,1,[{file,\"src/riak_core_broadcast.erl\"},{line,616}]},{riak_core_broadcast,start_link,0,[{file,\"src/riak_core_broadcast.erl\"},{line,116}]},{supervisor,do_start_child,2,[{file,\"supervisor.erl\"},{line,310}]},{supervisor,start_children,3,[{file,\"supervisor.erl\"},{line,293}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,259}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}}},{riak_core_app,start,[normal,[]]}}}"}
> Crash dump was written to: /var/log/riak/erl_crash.dump
> Kernel pid terminated (application_controller)
> ({application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
> riak at 10.44.2.8',


The I formed a new cluster via join & plan & commit.

But now, I discovered a problems with incomplete and inconsistent
partitions:

*$ *curl -Ss "
http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true" |
jq '.[] | length'

3064

*$* curl -Ss "
http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true" |
jq '.[] | length'

2987

*$* curl -Ss "
http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true" |
jq '.[] | length'

705

*$* curl -Ss "
http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true" |
jq '.[] | length'
3064

Is there a way to fix this? I guess this is caused by the missing old
ring-state?

Greetings

Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160606/d448a95e/attachment.html>


More information about the riak-users mailing list