Uneven distribution of partitions in RIAK cluster

Drew Pirrone-Brusse dpirrone-brusse at basho.com
Mon Nov 14 13:05:44 EST 2016

Hi Ray,

Riak's partition distribution is automatically calculated using our
nondeterministic `claim` algorithm. That system is able to re-balance
clusters, but is typically only run during membership operations; joining,
leaving, or replacing nodes. The uneven partition distribution won't
self-heal unless you add a new node to this cluster.

We can force a re-balance of this sort of uneven distribution by
temporarily switching from `claim_v2` to `claim_v3`, and triggering a
membership recalculation. `claim_v3` is still an experimental system that
is much more aggressive about avoiding preflist violations and lumpy
claims, without much regard for limiting the scope of membership changes.
With `claim_v2`, the addition of a new node to an existing cluster will
almost always only involve moving partitions off of existing nodes and onto
the new node. With `claim_v3`, it's somewhat common to see partitions also
being moved between existing partitions in order to prevent lumpy claims.

These unpredictable spikes in membership changes have caused serious
problems for our customers in the past, and they are nearly impossible to
plan for, so we don't advise using `claim_v3` for the majority of

To enable `claim_v3` and trigger a re-balance of the ring,

1. Enable the use of `claim_v3` by opening a `riak attach` session on any
node in this cluster, and running the below snippets,

    rpc:multicall(application, set_env, [riak_core, wants_claim_fun,
{riak_core_claim, wants_claim_v3}]).
    rpc:multicall(application, set_env, [riak_core, choose_claim_fun,
{riak_core_claim, choose_claim_v3}]).

(Please note, the `.`s are syntactically significant in Erlang, and you can
exit `attach` sessions with `ctrl+g, q, enter`.)

2. Determine which node is currently the Claimant by running `riak-admin
ring-status` on any node in the cluster. Look for the line similar to
`Claimant: 'dev2 at'`.

3. Stop the claimant. In this case I would run `riak stop` on dev2 at

4. Trigger the election of a new claimant by marking the current claimant
DOWN in the ring. In this case, I would run `riak-admin down dev2 at`
on any active node in this cluster.

5. Verify the reelection with `riak-admin ring-status` (checking to make
sure the claimant has changed), and restart the node that was previously

At this time the rebalance should have occurred and membership transfers

6. To disable `claim_v3`, open another `riak attach` session on any node in
this cluster, and run the below snippets,

    rpc:multicall(application, set_env, [riak_core, wants_claim_fun,
{riak_core_claim, default_wants_claim}]).
    rpc:multicall(application, set_env, [riak_core, choose_claim_fun,
{riak_core_claim, default_choose_claim}]).

This can be done while the transfers are in-flight. The new plan will have
already been injected into the ring.

I hope this helps.
Best regards,

On Fri, Nov 11, 2016 at 2:13 PM, Semov, Raymond <rsemov at ebay.com> wrote:

> I have a 5-node cluster with 12 partitions in 4 of the nodes and 16
> partitions in node #5. That is causing dangerously high disk utilization in
> that node. I plowed thru the documentation and Googled the hell out of it
> but I can’t find info on how rebalance the extra 4 partitions on the 4
> underutilized nodes. The docs say the cluster balances itself but that’s
> apparently not the case here. Can anyone give any suggestions?
> I run RIAK version 1.4.8 on Linux kernel 3.13
> Ray
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20161114/15a3b4c2/attachment-0002.html>

More information about the riak-users mailing list