riak_core questions

Dmitry Demeshchuk demeshchuk at gmail.com
Thu Jul 28 12:22:00 EDT 2011


Hi, Justin.

Unfortunately, the application should behave like that, though I don't
like it myself. During any outage, we should be completely sure that
all the vnodes are up, and we cannot use any kind of replication in
our case. It is just the way to ensure that we don't access the same
resource at the same time from different nodes, while being able to
access any resource, even when some nodes are down.

And in case of any partitions, that are much less likely to happen,
we'll just shut down all the lesser partitions, while only the largest
one will remain running, until the outage is resolved manually.

By master node, I mean the one that is used when we are joining new
nodes using riak-admin (as far as I remember, only one node can be
used for this). I believe, it's the one that is is returned by
riak_core_ring:owner_node/1.
Maybe, I'm wrong and when we call riak_core_gossip:send_ring(RingNode,
node()), we can use any node from the cluster
as RingNode?

Thank you.

On Thu, Jul 28, 2011 at 5:20 PM, Justin Sheehy <justin at basho.com> wrote:
> Hi, Dmitry.
>
> A couple of suggestions...
>
> The reason that you're not seeing an easy way to automatically have nodes be added or removed from the cluster upon going down or coming up is that we recommend strongly against such behavior.
>
> The idea is that intentional (administrative) outages are very different in nature from unintentional and potentially transitory outages. We have explicit administrative commands such as "join" and "leave" for the administrative cases, making it very easy to add or remove hosts to a cluster. When a node is unreachable, you often can't automatically tell whether it is a host problem or a network problem and can't automatically tell if it is a long-term or short-term outage. This is why mechanisms such as quorums and hinted handoff exist: to ensure proper operation of the cluster as a whole throughout such outages. Consider the case where you have a network problem such that several of your nodes lose visibility to each other for brief and distinct periods of time. If nodes are auto-added and auto-removed then you will have quite a bit of churn and potentially a very harmful feedback scenario. Instead of auto-adding and auto-removing, consider using things like
> riak_core_node_watcher to decide which nodes to interact with on a per-operation basis.
>
> I'm also not sure what you mean by "if the master node goes down" since in most riak_core applications there is no master node. Of course you can create such a mechanism if you need it, but (e.g.) Riak KV and the accompanying applications do not have any notion of a master node and thus do not have any such concern.
>
> I hope that this is useful.
>
> Best regards,
>
> -Justin
>
>
>



-- 
Best regards,
Dmitry Demeshchuk




More information about the riak-users mailing list