Pervasive replication

Alexander Sicular siculars at
Mon Oct 4 23:52:34 EDT 2010

Hi Andrew,

Distribution in Riak is based on the concept of the vnode (virtual  
node). The vnode is the unit of measure in reference to "N" replicas.  
At the time of cluster creation there is some upper limit of vnodes in  
a cluster. The number of vnodes a pnode (physical node) owns is based  
on how many pnodes are in the cluster.

Here are some quick points re distribution/replication (which are  
different concepts) in Riak as it stands today:

-64 vnodes default. Must be considered when first creating a cluster.

-a simple rule of thumb is minimum 10 vnodes per pnode with number of  
vnodes being a power of 2 (64, 128, 256, ...)

-homogenious vnode distribution against pnodes. The original dynamo  
paper calls for a heterogenious distribution based on relative  
performance of pnodes. I believe there are some open tickets on this.

-homogeneous pnodes. Any pnode can answer any request. If the pnode  
does not own the data in it's share of vnodes it will fetch data from  
one that does and then return.

-"N" replicas does not guarantee N copies on N pnodes. It guarantees  
you copies on N-1 pnodes. That is one of the reasons the minimum  
recommended setup is a cluster of 3 pnodes.

-there is no "eventual propogation" there is "eventual  
consistency" (the "C" in CAP theorem). Once you issue a write command  
it gets written as fast as possible to all the vnodes that it should  
be written to save network partitions (the "P" in CAP) in which case  
there are fallback vnodes.

In order to accomplish what I think you want to accomplish you would  
have to hash the key in the same manner that Riak does, then translate  
that hash to it's vnode in the keyspace, then map that vnode to a  
pnode, then open a connection to that pnode and get your data. Having  
said that, I think it is quite possible by becoming quite familiar  
with riak_core (the module that governs hashing (I believe), vnodes,  
handoffs and gossip) and either wrapping or extending it to taste.

On the other hand bare in mind that Riak, as it is configured by  
default, is a disk store (although you could configure memory only  
backends - which I highly discourage). Ultimately your latency will be  
i/o bound. You may do better to shard Redis if ultra low latency is  
your primary concern. A possible approach could be replicated clusters  
across sites. Bitcask is basically a WOL (write only log) that could  
theoretically be realtime replicated over a SAN.

Additionaly, open source Riak does not support multiple site clusters.  
Enterprise Riak does and may have what you need re multisite but I  
doubt it does re a guarantee as to having a physical copy of data in  
each site. Currently the only benefit to having your data on more  
vnodes is for the distributed map phase of a map/reduce query which, I  
have no doubt, will reach a point of diminishing returns (interesting  
research project though).

Further, in regards to systems of Dynamo pedegree it should be noted  
that Dynamo is principally concerned with maximally available systems  
(the "A" in CAP) with regards to write availability. Conflicts are not  
resolved at the persistence layer but rather at the application layer  
(under certain configurations).

Good luck,

@siculars on twitter

Sent from my iPhone

On Oct 4, 2010, at 22:35, Andrew Cooper  
<andrew.stephen.cooper at> wrote:

> Hello everyone,
> The Dynamo replication scheme allows one to specify how many nodes a
> change should be propagated to. But what if I want a change to be
> (eventually) propagated to each and every machine in the cluster?
> I can't simply specify a fixed number of nodes to replicate to,
> because, in the general case, this is time-varying. My use case is a
> multi-datacenter cluster with a low read latency requirement for
> certain records (thus those records need to be present in each
> datacenter, if not even on each machine).
> Is this currently possible, or would it require a patch? Or would this
> be too difficult to implement with the current architecture? This
> seems to be a problem common to all databases using the Dynamo
> replication scheme... or is it? Thoughts?
> Thanks,
> Andrew
> _______________________________________________
> riak-users mailing list
> riak-users at

More information about the riak-users mailing list