Riak behavior under stress

Kirill A. Korinskiy catap+riak at catap.ru
Thu Nov 26 02:36:02 EST 2009

Hello Justin,

At Wed, 25 Nov 2009 16:12:32 -0500,
Justin Sheehy <justin at basho.com> wrote:
> On Wed, Nov 25, 2009 at 3:29 PM, Lev Walkin <vlm at lionet.info> wrote:
> > However, in a test case with N=3 and R=1, when we bring down the
> > one out of three nodes, the Riak cluster returns with a timeout,
> > {error,timeout} instead of returning the answer available on the
> > two nodes which are still alive.
> That is not the usual or expected response.  If you are seeing this in
> practice, I'd be interested to see more about your configuration.

For example of test case you can try get not exist data for get
{error, not_found} when bring down the one out of three node. Riak
cluster return every time a {error, timeout}. I see only one way for
it -- riak wait R good ({ok, Data}) response and (N - R) + 1 fail
response, right?

> > The Riak's source code uses N and R values to determine a number
> > of nodes on which to store data (N) and which should be expected
> > to return an answer when asked (R). The behavior that puzzles me
> > is that it awaits (R) positive answers and (N-R)+1 negative ones
> > from the cluster.
> Note that it will always send those messages to "up" nodes, meaning
> that if a node is down at the time of message sending it will not
> attempt to get a reply from it.

Ok, in opensource edition Riak have a two set of nodes for send
request: Targets and Fallbakcs. Riak send data to another, not Targets
node, if some targets node is down, right?

> >        b) Since one Riak node is unavailable, there is no 3
> >        nodes available which can confirm data unavailability,
> >        therefore it returns with an {error, timeout}.
> This should only occur if the node in question actually goes down
> during (not before) the request.  In a usual case {ok,Data} will be
> returned to such a reply.
> > Question: is this expected behavior? I would presume that Riak
> > should either allow N=3,R=1 requests to be satisfied even when
> > one node dies (and, ideally, when two out of three nodes die),
> > or the documentation needs to be updated to highlight the fact
> > that R=1 is unusable in practice. Could someone clarify this?
> I have just verified this by setting up a three-node cluster, storing
> a document in a bucket with n_val of 3, then taking down one of the
> three nodes.  A subsequent get of that document with different
> R-values:
> R=1 returned immediately with the document
> R=2 returned immediately with the document
> R=3 returned immediately with notfound, as the third replica was unavailable
> In other words, the behavior you describe is neither what is expected
> nor what I see in practice.  Is your question based on a running
> cluster?  If the latter, can you elaborate on exactly how you are
> causing that behavior?  I would like to help you to resolve any
> problems you are seeing.

Can you verify you answer for {error, notfound} riak answer? I think
C:get(<<"Table">>, <<"Key">>, 1) return {error, timeout} instead
{error, notfound} if one of three node is down.

> > The Riak's source code and documentation makes references to the
> > Merkle trees, used to exchange information about the hash trees.
> > The documentation and marketing material suggests that Riak can
> > automatically synchronize the data in certain conditions.
> There are two ways in which Riak uses merkle trees.
> The first use is to reconcile documents stored under hinted-handoff.
> If you store a document when some of the nodes are down, it will be
> stored at a node other than the "ideal" one.  When the ideal node
> comes back online, the nodes handling those documents that were stored
> in the interim will exchange merkle trees with the returning node in
> order to determine which documents to use in bringing it up to date.

What way for use a node other than the "ideal" one for update a
document in cluster?

wbr, Kirill

More information about the riak-users mailing list