have a new node take over the role of a downed, unrecoverable node?

Sean Cribbs sean at basho.com
Sat Oct 16 23:57:44 EDT 2010

On Oct 16, 2010, at 9:52 PM, Leonid Riaboshtan wrote:

> >> However, you'll need to reip on all machines.
> Hmm, isn't stuff like that should be treated automaticly by Riak? I mean I have a cluster where nodes leave, nodes come. And after each come/leave I need to do something to nodes in entire cluster to entroduce/remove new/old node and repartion the data?

There are two kinds of "coming" and "leaving" -- temporary and permanent.  Temporary absences, yes, are handled automatically with things like hinted handoff and read repair.  Permanent absences -- as in the case of EC2 instance outage -- are things that need to be handled by a competent system administrator or developer.  The truth is, changes to the ring state are expensive - it needs to be gossiped around, data needs to shift around the cluster.  If Riak were automatically making those operational decisions for you, performance and stability would suffer.

As a side note, the needing to reip on all machines is a problem with the `reip` command, not the core gossip functionality.

> And question sounds rather strange to me, what is the node role in system where all nodes are equal? It's everywhere said that  Riak will automatically re-balance data as nodes join and leave the cluster. It's not the case when node becomes unreachable and cluster would repartion data to keep it solid (like keeping n_val for keys)? 

Yes, n_val is still respected -- fallbacks take over for the missing node(s), even in an extended outage.  In the quoted sentence, we're talking about more permanent membership changes.

> Or something else should watch for nodes states and tell cluster that node is down? 
> It's also said that:
> The ring state is shared around the cluster by means of a "gossip protocol". Whenever a node changes its claim on the ring, it announces its change via this protocol. It also periodically re-announces what it knows about the ring, in case any nodes missed previous updates.

Actually, Erlang's built-in networking takes care of a lot of the checking for node availability; connected nodes will periodically send heartbeat messages to one another.  If a node becomes unavailable, it is removed from preflists and fallbacks take over.  The gossip protocol is for propagating changes to the ring state around the cluster. If the ring were frequently unstable (temporary partitions/failures affecting membership), you'd have a lot of trouble performing normal operations.

As I said above, the key difference here is between temporary and permanent failures.

> Isn't cluster checking on unavailable nodes that way too?
> I'm not offending anyone, just trying to make things more clear for myself.

Not offended, your questions are important to answer and demonstrate a lack of clarity in our documentation.  Any suggestions you have on how to clear that up would be appreciated!

Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101016/fc4dd614/attachment.html>

More information about the riak-users mailing list