siculars at gmail.com
Mon Oct 4 23:52:34 EDT 2010
Distribution in Riak is based on the concept of the vnode (virtual
node). The vnode is the unit of measure in reference to "N" replicas.
At the time of cluster creation there is some upper limit of vnodes in
a cluster. The number of vnodes a pnode (physical node) owns is based
on how many pnodes are in the cluster.
Here are some quick points re distribution/replication (which are
different concepts) in Riak as it stands today:
-64 vnodes default. Must be considered when first creating a cluster.
-a simple rule of thumb is minimum 10 vnodes per pnode with number of
vnodes being a power of 2 (64, 128, 256, ...)
-homogenious vnode distribution against pnodes. The original dynamo
paper calls for a heterogenious distribution based on relative
performance of pnodes. I believe there are some open tickets on this.
-homogeneous pnodes. Any pnode can answer any request. If the pnode
does not own the data in it's share of vnodes it will fetch data from
one that does and then return.
-"N" replicas does not guarantee N copies on N pnodes. It guarantees
you copies on N-1 pnodes. That is one of the reasons the minimum
recommended setup is a cluster of 3 pnodes.
-there is no "eventual propogation" there is "eventual
consistency" (the "C" in CAP theorem). Once you issue a write command
it gets written as fast as possible to all the vnodes that it should
be written to save network partitions (the "P" in CAP) in which case
there are fallback vnodes.
In order to accomplish what I think you want to accomplish you would
have to hash the key in the same manner that Riak does, then translate
that hash to it's vnode in the keyspace, then map that vnode to a
pnode, then open a connection to that pnode and get your data. Having
said that, I think it is quite possible by becoming quite familiar
with riak_core (the module that governs hashing (I believe), vnodes,
handoffs and gossip) and either wrapping or extending it to taste.
On the other hand bare in mind that Riak, as it is configured by
default, is a disk store (although you could configure memory only
backends - which I highly discourage). Ultimately your latency will be
i/o bound. You may do better to shard Redis if ultra low latency is
your primary concern. A possible approach could be replicated clusters
across sites. Bitcask is basically a WOL (write only log) that could
theoretically be realtime replicated over a SAN.
Additionaly, open source Riak does not support multiple site clusters.
Enterprise Riak does and may have what you need re multisite but I
doubt it does re a guarantee as to having a physical copy of data in
each site. Currently the only benefit to having your data on more
vnodes is for the distributed map phase of a map/reduce query which, I
have no doubt, will reach a point of diminishing returns (interesting
research project though).
Further, in regards to systems of Dynamo pedegree it should be noted
that Dynamo is principally concerned with maximally available systems
(the "A" in CAP) with regards to write availability. Conflicts are not
resolved at the persistence layer but rather at the application layer
(under certain configurations).
@siculars on twitter
Sent from my iPhone
On Oct 4, 2010, at 22:35, Andrew Cooper
<andrew.stephen.cooper at gmail.com> wrote:
> Hello everyone,
> The Dynamo replication scheme allows one to specify how many nodes a
> change should be propagated to. But what if I want a change to be
> (eventually) propagated to each and every machine in the cluster?
> I can't simply specify a fixed number of nodes to replicate to,
> because, in the general case, this is time-varying. My use case is a
> multi-datacenter cluster with a low read latency requirement for
> certain records (thus those records need to be present in each
> datacenter, if not even on each machine).
> Is this currently possible, or would it require a patch? Or would this
> be too difficult to implement with the current architecture? This
> seems to be a problem common to all databases using the Dynamo
> replication scheme... or is it? Thoughts?
> riak-users mailing list
> riak-users at lists.basho.com
More information about the riak-users