large ring_creation_size

Joseph Blomstedt joe at basho.com
Wed May 16 22:14:53 EDT 2012


We've generally recommend 1024, although I think there are a few users
running with 2048. It is unfortunate that ring size is currently a
static parameter that you need to size properly from the onset. There
are plans to support dynamic resizing of the ring, but those changes
won't land any sooner than Q3 this year.

The "Too many processes" message means that you exhausted the process
limit of the Erlang VM. By default, the Erlang VM has a limit of 32768
processes. "Processes" in Erlang are very lightweight, and Riak (or
any Erlang application for that matter) uses processes liberally.
Thus, each vnode is probably using ~5-20 processes (the vnode process,
the vnode proxy process, and the async worker pool all come to mind).
With 4096 vnodes, you're likely using most/all of this default limit.

Of course, you can increase the limit. You just need to add "+P
number" to your vm.args file. The maximum value you can set is
134217727. Feel free to try increasing the limit and see how things
go.

Alexander brings up good points about too many nodes per box
contending for a limit set of system resources. Each VM instance will
use up a portion of RAM, a set of TCP sockets, and numerous file
descriptors for backends. However, depending on the hardware, and how
you've setup/tuned your OS, you may have plenty of capacity to run 12
nodes x 4096 partitions per box.

There are a few Riak specific issues that are important to consider.
The ring size directly impacts the amount of gossip traffic in your
cluster. Architectural changes introduced in Riak 1.0 led to a
practical limit of ~1024 partitions due to an issue with gossip
traffic overloading the cluster. This was partially mitigated in
1.0.3, and resolved by Riak 1.1. Thus, overload should no longer be a
critical issue, but it will take longer for a cluster with a larger
ring to add/remove nodes, and the clustering system will eat up more
network bandwidth. If this is a problem or not will again depend on
your hardware and network.

The ring size also correlates directly to the number of open files.
Each vnode runs a separate backend instance. So, with 4096, you would
have 4096 open Bitcask/LevelDB databases open per node. Each instance
opening multiple files. And each instance would content for disk IOPs.
Granted, once you join all your nodes together, you will have your
vnodes spread across each Riak node, so you should expect only ~34
vnode per node (as you calculated) with only 34 backend instances per
disk. This is standard territory and no big deal. The only thing to
consider is how a large set of node failures may impact you, as
fallback traffic sent to secondary nodes would spin up additional
vnodes on each secondary.

The final point to consider in this design is that Riak does nothing
to guarantee replica placement outside of trying to ensure that
replica are on different nodes. Riak doesn't know that 12 of your
nodes are on the same physical server. Therefore, Riak may assign all
of your replicas to the same physical machine. Having disjoint disks
will protect you against a disk failure, but if an entire server goes
offline or has a catastrophic hardware failure, you risk losing all
replicas for a portion of your keyspace. Deliberately choosing to
reduce the safe guards Riak provides doesn't seem like the best idea,
but if this risk is something your usage can tolerate, then feel free
to give it a shot and see how things go.

-Joe

On Wed, May 16, 2012 at 6:07 PM, Alexander Sicular <siculars at gmail.com> wrote:
> There are reasonable limits. And also for architecture. If I understand you correctly, yours is not reasonable. Although you have jbod's on each physical server does not mean you should run "12 riak nodes on a given physical node." there are other resource limits than just disk io. In this instance you are running into an os limitation in the number of file descriptors, aka max open files including sockets.
>
> If you absolutely want to do it your way, which I absolutely do not recommend, I would run many virtual machines on top of each physical device from its own disk. But like I said, even then I wouldn't.
>
> Constructively, I think the largest recommended ring size is 1024 in practice.
>
> -Alexander
>
> @siculars on twitter
> http://siculars.posterous.com
>
> Sent from my iRotaryPhone
>
> On May 16, 2012, at 20:47, Sam Lang <samlang at gmail.com> wrote:
>
>>
>> Hello,
>>
>> I have a 10 node cluster with 12 disks/devices per node.  I have setup riak in a jbod environment with separate riak nodes per device, resulting in 120 riak nodes total.  Based on the documentation, I set the ring_creation_size to 4096, which gives me ~34 partitions/node and allows me to expand the cluster in the future.  With this value, soon after starting the 12 riak nodes on a given physical node, I get "Too many processes" errors and the riak nodes fail.  Decreasing the value of ring_creation_size to 1024 (giving me ~8 partitions per riak node) prevents the riak nodes from failing, but then expanding the cluster becomes a problem (30 physical nodes would result in only ~3 partitions per node).
>>
>> Is there a reasonable limit on the value of ring_creation_size?
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



-- 
Joseph Blomstedt <joe at basho.com>
Software Engineer
Basho Technologies, Inc.
http://www.basho.com/




More information about the riak-users mailing list