Production deployment requirements for memory backend storage
cvoiselle at basho.com
Wed Apr 12 15:00:17 EDT 2017
Thanks for you interest in Riak. I will copy your questions into this email for reference and answer them inline.
1. What is the “platform_data_dir” used for when memory is used as storage backend? Is it only needed for active anti-entropy and cluster metadata? Do I need to persist this data i.e. if a node goes down and restarts in this configuration, is persistence of data in “platform_data_dir” required.
As you have pointed out, the platform_data_dir contains more than just the actual data stored in the cluster. There are three folders that must be persisted for a node to remain a member of a cluster and to not create issues with the sizes of the vector clocks internal to the objects. They are:
ring - The binary files that describe the cluster and the vnode ownership mappings. Deleting this folder will cause the node to start up and create a new default ring. This default ring will allocate 100% of the partitions to that node. This is non-fatal and is resolved by rejoining the node to the cluster. This extra work can be avoided by persisting the ring file properly.
cluster_meta - This folder contains the properties for bucket types and typed custom buckets.
kv_vnode - This folder contains generated actor-ids for each Riak vnode. The routine loss of this directory will cause orphaned vnode actor-ids to potentially accumulate in objects’ vclocks.
Active anti-entropy is a process to prevent bit-rot in long-lived data. Since your questions we concerning ephemeral data, we would recommend that it be disabled because there are overheads in creating and maintaining the trees that make no sense for ephemeral data.
2. What is the minimum memory requirement of an empty Riak node in this configuration?
On a sample node that I brought up, an empty Riak KV 2.2.3, the beam.smp process was using 1.5 gb of RAM with an empty memory backend and AAE-disabled.
3. What is the minimum disk and CPU requirement of a Riak node in this configuration?
There are a few variables that dictate how much actual disk throughput you will use in a Riak cluster that only uses the memory backend-logging overhead, ring changes, and cluster metadata changes.
Logging throughput is determined by the general health of the cluster and is minimal in clusters that are well-behaved. The logfiles themselves have configurable size caps and set numbers of rotations (by default 5 logs capped at 50mb for each file). There are some other logfiles that are not managed by lager and they can grow beyond these expected limits. If you are building nodes optimized for storage, you will want to monitor the size of this folder and trim it as appropriate.
The ring is a data structure that is used to hold information about the cluster’s membership, the node capabilities, MDC replication configuration, and the legacy custom bucket metadata. In stable clusters that are using no custom buckets the impact of writes to the ring is negligible; however there are certain antipatterns involving the creation of a large number of buckets with custom properties in the “default” bucket type that will bloat the ring file and result in a large amount of ring gossip.
Finally, Riak bucket types and their properties as well as the custom bucket properties of typed buckets is stored in cluster-metadata. This backend is a dets-based store that uses hashtree comparisons to maintain consistency across members of the cluster. This backend’s storage also depends on the amount and speed with which you create metadata within your cluster.
There is more generically-applicable information about [cluster capacity planning] <http://docs.basho.com/riak/kv/2.2.3/setup/planning/cluster-capacity/> in the Riak KV documentation.
Thanks again for your interest,
Sr. Product Manager, Riak KV/Clients
[cluster capacity planning] - http://docs.basho.com/riak/kv/2.2.3/setup/planning/cluster-capacity/ <http://docs.basho.com/riak/kv/2.2.3/setup/planning/cluster-capacity/>
> On Apr 10, 2017, at 3:30 PM, Neeraj Poddar <N.Poddar at F5.com> wrote:
> I wanted to understand the production requirements for using Riak as a non-persistent ephemeral data store. In particular the following questions relate to using Riak with “memory” configured as storage backend:
> 1. What is the “platform_data_dir” used for when memory is used as storage backend? Is it only needed for active anti-entropy and cluster metadata? Do I need to persist this data i.e. if a node goes down and restarts in this configuration, is persistence of data in “platform_data_dir” required.
> 2. What is the minimum memory requirement of an empty Riak node in this configuration?
> 3. What is the minimum disk and CPU requirement of a Riak node in this configuration?
> Neeraj Poddar
> riak-users mailing list
> riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users