grourk at dropcam.com
Wed Apr 13 15:13:34 EDT 2011
I am experimenting with different ring sizes for a Riak cluster that will initially be 3-5 nodes but will grow to a large number of nodes, like 1000 or more. I'm also using the multiple disks per node setup discussed in other thread on this list. Because of that, we'll need to have at least 1 partition per disk on each node. So if we plan to grow to 1000 12 disk nodes, we'll need to have at least 12,000 partitions. Since ring_creation_size needs to be a power of 2, this means we should start with 2^14 = 16,384.
When I do this on a single node, it actually starts up just fine and uses about 1GB of memory. And that single-node cluster seems to operate normally, although CPU usage is much higher than when with a much smaller ring of 1024. But as soon as I try to join a second node with the first, it ramps up memory usage and ultimately crashes. This happens even when I tone it down to ring_creation_size = 4096.
These test machines don't have much memory (2 GB), but this behavior seems strange. Why would a single node cluster handle many thousands of partitions, but as soon as you join a second node resource usage on both machines goes through the roof? This is before there's even any data stored.
A more general question is what is the resource impact of each vnode? They share the keydir.. Do they each have their own copy of the ring state? Or other data structures?
I'd also love to hear any other thoughts on growing into a cluster this large. :)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users