Cluster rebalancing

Thomas Santero tsantero at gmail.com
Sun Jul 6 09:09:28 EDT 2014


Hi Chaim,

Inline

> On Jul 6, 2014, at 1:13 AM, Chaim Solomon <chaim at itcentralstation.com> wrote:
> 
> I don't think I was quite clear in what I asked for.

My apologies for misunderstanding your previous query.

> 
> I am not asking for the ability to influence the hashing algorithm. That would be a mess.
> But I would like to be able to have more influence on the distribution of vnodes on the nodes - and that is something that RIAK already does.
> 
> So a command to bump a vnode off a particular node or reduce the number of vnodes on a node or set the target percentage on a node would be nice. It seems like the current algorithm already does something similar - but I didn't see how one can influence that.
> 
> The other issue was that I would suggest taking the disk space into consideration.
> If you have nodes that have different storage then balancing the data equally between nodes may not be the best option. 
> It may be better to take the available disk space into consideration and move vnodes to nodes that have free space if a node runs low on space.

What you refer to here would be nice, and is something referred to as "weighted claim." I know it's been discussed a bit in the past. Perhaps someone from Basho can chime in and let us know if it's on the roadmap for a future release?

> 
> One simple use case would be expanding a cluster with newer nodes (that have more storage) and being able to utilise that storage. 
> 
> Another would be to be able to distribute larger partitions more evenly - in particular if the size per partition is not evenly distributed.
> 
> Chaim Solomon
> 
> 
> 
>> On Thu, Jul 3, 2014 at 8:51 PM, Tom Santero <tsantero at gmail.com> wrote:
>> responses inline
>> 
>> 
>>> On Thu, Jul 3, 2014 at 2:45 AM, Chaim Solomon <chaim at itcentralstation.com> wrote:
>>> Hi,
>>> 
>>> I'm running a 2.0.0b cluster (small) and have been running out of space on one node. 
>>> I had expected that adding a node would lead to freeing up of space on other nodes - but it's not working too fast.
>> 
>> Keep in mind that the speed of transfers is bound by the bandwidth available on the network as well as the speed at which you can actually read the data off disk. Once the transfers complete you should see the disk freed. 
>>  
>>> 
>>> I would suggest to add to RIAK a way to have the distribution algorithm take free space into consideration and to move data to empty nodes fast. Another issue is that adding the node moved most nodes from 25% to 18.8% - but one stayed on 25% in the planner.
>> 
>> The algorithm Riak uses to determine vnode placement is non-deterministic; if you don't like any given staged vnode distribution I might suggest you run riak-admin cluster clear to undo any staged changed and attempt to add the node again, until you're content with the new plan. 
>>  
>>> 
>>> And I would also suggest adding some way to force a rebalancing of the cluster to force nodes to take up more load if they don't have enough or hand off load to others.
>> 
>> The hashing algorithm used by Riak to determine object placement in the ring is uniform--over time and with a greater number of total keys you'll start to see a smoother distribution across all partitions. 
>> 
>> On the fly rebalancing would be incredibly expensive, especially for users who have lots of nodes and petabytes of data stored in Riak. Ad-hoc partition handoff would most likely be brittle and error-prone, given the unreliability of the network.
>> 
>> In my humble opinion the engineers at Basho work harder than most other distributed systems developers, considering all the edge cases where systems can fail unexpectedly; I say this not to boost their egos, but rather to point out that their approach has the effect of making Riak more robust and resilient than most other distributed datastores. But such resiliency isn't free, and for these guarantees every user must pay the price. Riak might not be the fastest database, and it may even underutilize that really expensive hardware you might throw at it...but i'll be damned if it doesn't lie to me, lose my data or pretend that failures like network partitions don't happen. 
>>  
>>> 
>>> Chaim Solomon
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140706/3ca192f1/attachment.html>


More information about the riak-users mailing list