How to cold (re)boot a cluster with already existing node data

DeadZen deadzen at deadzen.com
Mon Jun 6 10:33:33 EDT 2016


this might be helpful, an Omniti article.
https://omniti.com/seeds/migrating-riak-do-it-live

As to fixing this specific error. That iirc can be done doing a name change
in the ring to match your new node name.  renaming the node will make that
orddict lookup succeed.
Theres a supplied admin utility for that.


On Sunday, June 5, 2016, Jan-Philip Loos <maxdaten at gmail.com> wrote:

> Hi,
>
> we are using riak in a kuberentes cluster (on GKE). Sometimes it's
> necessary to reboot the complete cluster to update the kubernetes-nodes.
> This results in a complete shutdown of the riak cluster and the riak-nodes
> are rescheduled with a new IP. So how can I handle this situation? How can
> I form a new riak cluster out of the old nodes with new names?
>
> The /var/lib/riak directory is persisted. I had to delete the
> /var/lib/riak/ring folder otherwise "riak start" crashed with this message
> (but saved the old ring state in a tar):
>
> {"Kernel pid
>> terminated",application_controller,"{application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>> riak at 10.44.2.8 <javascript:_e(%7B%7D,'cvml','riak at 10.44.2.8');>
>> ',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_broadcast,init_peers,1,[{file,\"src/riak_core_broadcast.erl\"},{line,616}]},{riak_core_broadcast,start_link,0,[{file,\"src/riak_core_broadcast.erl\"},{line,116}]},{supervisor,do_start_child,2,[{file,\"supervisor.erl\"},{line,310}]},{supervisor,start_children,3,[{file,\"supervisor.erl\"},{line,293}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,259}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}}},{riak_core_app,start,[normal,[]]}}}"}
>> Crash dump was written to: /var/log/riak/erl_crash.dump
>> Kernel pid terminated (application_controller)
>> ({application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['
>> riak at 10.44.2.8 <javascript:_e(%7B%7D,'cvml','riak at 10.44.2.8');>',
>
>
> The I formed a new cluster via join & plan & commit.
>
> But now, I discovered a problems with incomplete and inconsistent
> partitions:
>
> *$ *curl -Ss "
> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
> | jq '.[] | length'
>
> 3064
>
> *$* curl -Ss "
> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
> | jq '.[] | length'
>
> 2987
>
> *$* curl -Ss "
> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
> | jq '.[] | length'
>
> 705
>
> *$* curl -Ss "
> http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true"
> | jq '.[] | length'
> 3064
>
> Is there a way to fix this? I guess this is caused by the missing old
> ring-state?
>
> Greetings
>
> Jan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160606/23efd0bb/attachment-0002.html>


More information about the riak-users mailing list