question on replicas distribution
justin at basho.com
Wed Oct 14 22:27:47 EDT 2009
On Wed, Oct 14, 2009 at 10:04 PM, stewart mackenzie <setori88 at gmail.com> wrote:
> You guys have instead opted for a scenario where the addressing of the
> replica to a node is hashed/baked into the actual key. ....
This is the essence of consistent hashing: use of the same addressing
domain for objects and their locations, and a well-known hash function
into that domain.
> What happens if a first/fresh replica xyz stored on node 1. (and later
> replicated to nodes 2, 3 and 4.) Now the addressing in the key points to
> node 1. If node 1 goes down. How does the addressing get the value from 2, 3
> or 4.
The first thing to think about is N (or a bucket's configured n_val
parameter), which defaults to 3 but is configurable. This is the
number of replicas of a given document that should be stored. So, in
the typical case, a value will be stored on 3 different nodes
immediately upon the first put. (there is no "later replicated" as
all replicas are of equal interest, importance, and timing)
In addition to a well-known hash function we have a well-known and
simple function for finding the N preferred storage locations
corresponding to a given hash value. This can be seen in
riak_ring:filtered_preflist/3, and in most cases it boils down to the
next N ring positions that are owned by distinct nodes.
As a result, all N replicas of a given document are findable based
simply on the document's bucket and key, plus the current state of the
ring in terms of partition ownership -- without any "routing" or need
to ask any other node where the replicas are.
(There are some interesting failover cases when a node is down or when
ownership is in flux, but this email message is already long enough.)
> I need to hit the books again! now where did I put that amazon dynamo paper
> again? ;)
The dynamo paper is actually a great place to start for this topic,
yes. Go for it.
More information about the riak-users