WARNING: Not all replicas will be on distinct nodes

Martin Sumner martin.sumner at infinityworks.com
Thu Dec 14 16:27:19 EST 2017


Daniel,

See this post (
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-August/019488.html)
and the links in it for some more details on issues with the core claim
algorithm.  The fix is in the pending release 2.2.5 which Russell is adding
the finishing touches to at the moment.

However, the fix may not immediately resolve your problem - the fix is
about preventing this situation, not necessarily about resolving it once it
has been created.  Also the issue we saw that would lead to this, would not
(I think) be triggered by adding a single node - unless the cluster already
had the problem.  So it is possible, although you are seeing the warning
now, you had the issue when your originally created the cluster, and the
change is just persisting the issue.  For instance going from nothing
straight to a 6-node with a ring-size of 128 would create this problem.

As a workaround there is the core claim v3 algorithm which can be turned
on, and you can see if this offers a better cluster plan without
violations.  I can't right now remember how to trigger v3 claim algorithm
though - google letting me down.

Ultimately, this may not be such a crisis.  The error is through whenever
the cluster cannot guarantee a "target_n_val" of 4.  So if you have an
n_val of 3 - you're not necessarily at risk of data loss.    To know you
will have to look at your ring via riak attach (see bullet point 2 in
http://docs.basho.com/riak/kv/2.2.3/using/running-a-
cluster/#add-a-second-node-to-your-cluster).

If you can figure out the violations from your ring, you may be able to
resolve by leaving the node that has the violations, and then re-adding it.

Sorry, I'm a bit rushed - but I hope this helps get you started.

Martin





On 14 December 2017 at 19:49, Daniel Miller <dmiller at dimagi.com> wrote:

> I have a 6 node cluster (now 7) with ring size 128. On adding the most
> recent node I got the WARNING: Not all replicas will be on distinct nodes.
> After the initial plan I ran the following sequence many times, but always
> got the same plan output:
>
> sudo riak-admin cluster clear && \
> sleep 10 && \
> sudo service riak start && \
> sudo riak-admin wait-for-service riak_kv && \
> sudo riak-admin cluster join riak at hqriak20.internal && \
> sudo riak-admin cluster plan
>
>
> The plan looked the same every time, and I eventually committed it because
> the cluster capacity is running low:
>
>
> Success: staged join request for 'riak at riak29.internal' to
> 'riak at riak20.internal'
> =============================== Staged Changes
> ================================
> Action         Details(s)
> ------------------------------------------------------------
> -------------------
> join           'riak at riak29.internal'
> ------------------------------------------------------------
> -------------------
>
>
> NOTE: Applying these changes will result in 1 cluster transition
>
> ############################################################
> ###################
>                          After cluster transition 1/1
> ############################################################
> ###################
>
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
> ------------------------------------------------------------
> -------------------
> valid      17.2%     14.1%    'riak at riak20.internal'
> valid      17.2%     14.8%    'riak at riak21.internal'
> valid      16.4%     14.1%    'riak at riak22.internal'
> valid      16.4%     14.1%    'riak at riak23.internal'
> valid      16.4%     14.1%    'riak at riak24.internal'
> valid      16.4%     14.8%    'riak at riak28.internal'
> valid       0.0%     14.1%    'riak at riak29.internal'
> ------------------------------------------------------------
> -------------------
> Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
> WARNING: Not all replicas will be on distinct nodes
>
> Transfers resulting from cluster changes: 18
>   2 transfers from 'riak at riak28.internal' to 'riak at riak29.internal'
>   3 transfers from 'riak at riak21.internal' to 'riak at riak29.internal'
>   3 transfers from 'riak at riak23.internal' to 'riak at riak29.internal'
>   3 transfers from 'riak at riak24.internal' to 'riak at riak29.internal'
>   4 transfers from 'riak at riak20.internal' to 'riak at riak29.internal'
>   3 transfers from 'riak at riak22.internal' to 'riak at riak29.internal'
>
>
> My understanding is that if some replicas are not on distinct nodes then I
> may have permanent data loss if a single physical node is lost (please let
> me know if that is not correct). Questions:
>
> How do I diagnose which node(s) have duplicate replicas?
> What can I do to fix this situation?
>
> Thanks!
> Daniel
>
>
> P.S. I am unable to get anything useful out of `riak-admin diag`. It
> appears to be broken on the version of Riak I'm using (2.2.1). Here's the
> output I get:
>
> $ sudo riak-admin diag
> RPC to 'riak at hqriak20.internal' failed: {'EXIT',
>                                                            {undef,
>                                                             [{lager,
>
> get_loglevels,
>                                                               [],[]},
>
> {riaknostic,run,
>                                                               1,
>                                                               [{file,
>
> "src/riaknostic.erl"},
>
> {line,118}]},
>                                                              {rpc,
>
> '-handle_call_call/6-fun-0-',
>                                                               5,
>                                                               [{file,
>
> "rpc.erl"},
>
> {line,205}]}]}}
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

-- 
***** Email confidentiality notice *****
This message is private and confidential.  If you have received this 
message in error, please let us know and remove it from your system.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20171214/5da65080/attachment-0002.html>


More information about the riak-users mailing list