Cluster rebalancing

Chaim Solomon chaim at
Sun Jul 6 01:13:01 EDT 2014

I don't think I was quite clear in what I asked for.

I am not asking for the ability to influence the hashing algorithm. That
would be a mess.
But I would like to be able to have more influence on the distribution of
vnodes on the nodes - and that is something that RIAK already does.

So a command to bump a vnode off a particular node or reduce the number of
vnodes on a node or set the target percentage on a node would be nice. It
seems like the current algorithm already does something similar - but I
didn't see how one can influence that.

The other issue was that I would suggest taking the disk space into
If you have nodes that have different storage then balancing the data
equally between nodes may not be the best option.
It may be better to take the available disk space into consideration and
move vnodes to nodes that have free space if a node runs low on space.

One simple use case would be expanding a cluster with newer nodes (that
have more storage) and being able to utilise that storage.

Another would be to be able to distribute larger partitions more evenly -
in particular if the size per partition is not evenly distributed.

Chaim Solomon

On Thu, Jul 3, 2014 at 8:51 PM, Tom Santero <tsantero at> wrote:

> responses inline
> On Thu, Jul 3, 2014 at 2:45 AM, Chaim Solomon <chaim at>
> wrote:
>> Hi,
>> I'm running a 2.0.0b cluster (small) and have been running out of space
>> on one node.
>> I had expected that adding a node would lead to freeing up of space on
>> other nodes - but it's not working too fast.
> Keep in mind that the speed of transfers is bound by the bandwidth
> available on the network as well as the speed at which you can actually
> read the data off disk. Once the transfers complete you should see the disk
> freed.
>> I would suggest to add to RIAK a way to have the distribution algorithm
>> take free space into consideration and to move data to empty nodes fast.
>> Another issue is that adding the node moved most nodes from 25% to 18.8% -
>> but one stayed on 25% in the planner.
> The algorithm Riak uses to determine vnode placement is non-deterministic;
> if you don't like any given staged vnode distribution I might suggest you
> run riak-admin cluster clear to undo any staged changed and attempt to add
> the node again, until you're content with the new plan.
>> And I would also suggest adding some way to force a rebalancing of the
>> cluster to force nodes to take up more load if they don't have enough or
>> hand off load to others.
> The hashing algorithm used by Riak to determine object placement in the
> ring is uniform--over time and with a greater number of total keys you'll
> start to see a smoother distribution across all partitions.
> On the fly rebalancing would be incredibly expensive, especially for users
> who have lots of nodes and petabytes of data stored in Riak. Ad-hoc
> partition handoff would most likely be brittle and error-prone, given the
> unreliability of the network.
> In my humble opinion the engineers at Basho work harder than most other
> distributed systems developers, considering all the edge cases where
> systems can fail unexpectedly; I say this not to boost their egos, but
> rather to point out that their approach has the effect of making Riak more
> robust and resilient than most other distributed datastores. But such
> resiliency isn't free, and for these guarantees every user must pay the
> price. Riak might not be the fastest database, and it may even underutilize
> that really expensive hardware you might throw at it...but i'll be damned
> if it doesn't lie to me, lose my data or pretend that failures like network
> partitions don't happen.
>> Chaim Solomon
>> _______________________________________________
>> riak-users mailing list
>> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list