Uneven distribution of partitions in RIAK cluster

Semov, Raymond rsemov at ebay.com
Tue Nov 29 15:42:55 EST 2016

Thank you for the response! I would love to consider the claim_v2 -> claim_v3 but since it’s experimental I’d rather not, I’m dealing with a RIAK cluster that is in production.
What I will end up doing is (after our team cleans up all the junk in the cluster) have a node leave the cluster and then rejoin. That’ll fix the fragmentation that will happen after the old data purge as well.

From: Drew Pirrone-Brusse <dpirrone-brusse at basho.com<mailto:dpirrone-brusse at basho.com>>
Date: Monday, November 14, 2016 at 10:05 AM
To: "Semov, Raymond" <rsemov at ebay.com<mailto:rsemov at ebay.com>>
Cc: "riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>" <riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>>
Subject: Re: Uneven distribution of partitions in RIAK cluster

Hi Ray,

Riak's partition distribution is automatically calculated using our nondeterministic `claim` algorithm. That system is able to re-balance clusters, but is typically only run during membership operations; joining, leaving, or replacing nodes. The uneven partition distribution won't self-heal unless you add a new node to this cluster.

We can force a re-balance of this sort of uneven distribution by temporarily switching from `claim_v2` to `claim_v3`, and triggering a membership recalculation. `claim_v3` is still an experimental system that is much more aggressive about avoiding preflist violations and lumpy claims, without much regard for limiting the scope of membership changes. With `claim_v2`, the addition of a new node to an existing cluster will almost always only involve moving partitions off of existing nodes and onto the new node. With `claim_v3`, it's somewhat common to see partitions also being moved between existing partitions in order to prevent lumpy claims.

These unpredictable spikes in membership changes have caused serious problems for our customers in the past, and they are nearly impossible to plan for, so we don't advise using `claim_v3` for the majority of operations.

To enable `claim_v3` and trigger a re-balance of the ring,

1. Enable the use of `claim_v3` by opening a `riak attach` session on any node in this cluster, and running the below snippets,

    rpc:multicall(application, set_env, [riak_core, wants_claim_fun, {riak_core_claim, wants_claim_v3}]).
    rpc:multicall(application, set_env, [riak_core, choose_claim_fun, {riak_core_claim, choose_claim_v3}]).

(Please note, the `.`s are syntactically significant in Erlang, and you can exit `attach` sessions with `ctrl+g, q, enter`.)

2. Determine which node is currently the Claimant by running `riak-admin ring-status` on any node in the cluster. Look for the line similar to `Claimant: 'dev2 at<mailto:dev2 at>'`.

3. Stop the claimant. In this case I would run `riak stop` on dev2 at<mailto:dev2 at>.

4. Trigger the election of a new claimant by marking the current claimant DOWN in the ring. In this case, I would run `riak-admin down dev2 at<mailto:dev2 at>` on any active node in this cluster.

5. Verify the reelection with `riak-admin ring-status` (checking to make sure the claimant has changed), and restart the node that was previously stopped.

At this time the rebalance should have occurred and membership transfers started.

6. To disable `claim_v3`, open another `riak attach` session on any node in this cluster, and run the below snippets,

    rpc:multicall(application, set_env, [riak_core, wants_claim_fun, {riak_core_claim, default_wants_claim}]).
    rpc:multicall(application, set_env, [riak_core, choose_claim_fun, {riak_core_claim, default_choose_claim}]).

This can be done while the transfers are in-flight. The new plan will have already been injected into the ring.

I hope this helps.
Best regards,

On Fri, Nov 11, 2016 at 2:13 PM, Semov, Raymond <rsemov at ebay.com<mailto:rsemov at ebay.com>> wrote:
I have a 5-node cluster with 12 partitions in 4 of the nodes and 16 partitions in node #5. That is causing dangerously high disk utilization in that node. I plowed thru the documentation and Googled the hell out of it but I can’t find info on how rebalance the extra 4 partitions on the 4 underutilized nodes. The docs say the cluster balances itself but that’s apparently not the case here. Can anyone give any suggestions?
I run RIAK version 1.4.8 on Linux kernel 3.13

riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20161129/e49732cb/attachment-0002.html>

More information about the riak-users mailing list