Riak replication and quorum

Sean Cribbs sean at basho.com
Fri May 13 12:06:06 EDT 2011


Peter,

You've hit on a major feature of Riak: to be available in the face of network and hardware failure.  

When a node is down, other nodes (ones that do not "own" the replicas for a given key) will pick up the slack and serve read and write requests on behalf of the downed node.  This means that while the node(s) is down, you could write a key to the cluster and read it back while still satisfying quorum.  The standard quorum considers fallback nodes to be as valid as non-fallbacks (we're also in the process of implementing a way for you to be more strict about that, if you so desire).  When the downed nodes return, writes that were sent to fallbacks are returned to their proper owners via hinted handoff.

This feature lets your application that uses Riak stay available (even if in a degraded state), despite multiple failures. We consider this A Good Thing.

Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 13, 2011, at 11:13 AM, Peter Fales wrote:

> 
> I'm a Riak newbie, trying to get some familiarity with the system by
> runing some tests on Amazon EC2.   I'm seeing some behavior that I don't 
> understand...
> 
> I've set up a test where I create a 4-node cluster using 4 EC2 machines.
> I've created a bucket with n_val=4, r=quorum, and w=quorum.   For
> n_val=4, the quorum should be 3, so I thought I would have to have at
> least 3 nodes in service for my read and write operations to succeed.
> During my test, I start sending read/write requests to two of the nodes
> (and I see the CPU load go up on all four nodes, so I know they are
> talking to each other).  Then I reboot the other two nodes.  At that 
> point, I was expecting the reads and writes to start failing, but in 
> fact I usually don't see any problems.  (sometimes the query that is 
> in progress at the time may fail or timeout, but if I establish a new
> connection to the server, and start sending read/write requests again,
> those requests will go through, even with only two of the 4 nodes in service)
> 
> I suspect I'm just missing something obvious, but I don't understand how
> I can run with just two nodes.  What am I missing?
> 
> -- 
> Peter Fales
> Alcatel-Lucent
> Member of Technical Staff
> 1960 Lucent Lane
> Room: 9H-505
> Naperville, IL 60566-7033
> Email: Peter.Fales at alcatel-lucent.com
> Phone: 630 979 8031
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list