Riak Join is Screwed Up

Joseph Blomstedt joe at basho.com
Thu May 3 07:41:57 EDT 2012


Yes, 'riak-admin join' is designed to join a node to a cluster, not a
cluster to another cluster. This behavior was different before Riak
1.0 (last September), but was changed as part of several changes to
make Riak's clustering layer more deterministic.

The short answer to the "why is this the case" is related to the fact
that Riak is a network-partition tolerant distributed database without
central coordination. If you have two separate clusters, both which
are receiving incoming client requests, and you try to join the two
clusters together during a network partition, there are very few
guarantees on deterministic behavior.

If clients can talk to all nodes in both clusters, but only some
subset of nodes in each cluster can talk to each other, you then end
up having clients talking to different nodes that have different
beliefs on what the cluster membership/topology is. This is even worse
if you have three clusters, and you concurrently issue a join between
1 and 2 and 2 and 3 during a similar partition scenario.

The more specific answer is that Riak provides certain guarantees by
ensuring that cluster changes logically appear to be totally ordered
events. The history of two completely independent clusters are
disjoint and cannot always be merged. This is solved for the single
node case by relaxing the constraint on total ordering. The cluster
you are joining still maintains a logically monotonic history. While
the node joining the cluster simply gives up it's history.
Specifically, the joining node asks the larger cluster for its cluster
information, and then atomically overwrites it's own view with that of
the cluster.

Regards,
Joe


On Thu, May 3, 2012 at 12:05 AM, Rebecca Meritz
<rebecca.meritz at klarna.com> wrote:
> I figured out a way to fix the problem. But I don't understand why the
> solution works.
>
> Before A sent a join request to B and then B to C and the second request
> failed.
>
> Now A sends a join request to B and then C to B and the second
> request succeeds.
>
> Must a node that is the only member of its cluster always ask to join the
> larger cluster for the request to succeed? Why is this?
>
> Thanks,
> Rebecca
>
> On Thu, May 3, 2012 at 8:41 AM, Rebecca Meritz <rebecca.meritz at klarna.com>
> wrote:
>>
>> I'm testing a script that joins my riak ring on all machines. There is no
>> data in the database yet but I have repeated joined the ring an separated it
>> while working on the script that sets up the environment on my machines. The
>> ring is now in a bizarre state:
>>
>> [rebecca]$ riak-admin member_status
>> Attempting to restart script through sudo -u riak
>> ================================= Membership
>> ==================================
>> Status     Ring    Pending    Node
>>
>> -------------------------------------------------------------------------------
>> valid      50.0%      --      'riak at XX.XXX.XX.10'
>> valid      50.0%      --      'riak at XX.XXX.XX.12'
>>
>> -------------------------------------------------------------------------------
>> Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>> [pay at payment-testing-gw2 ~]$ riak-admin join riak at XX.XXX.XX.14
>> Attempting to restart script through sudo -u riak
>> Failed: This node is already a member of a cluster
>> [pay at payment-testing-gw2 ~]$ riak-admin force-remove riak at XX.XXX.XX.14
>> Attempting to restart script through sudo -u riak
>> Failed: "riak at XX.XXX.XX.14" is not a member of the cluster.
>>
>> I cannot join a new member nor can I remove it.
>>
>> I've tried stop them all get them to leave, if they wouldn't leave I
>> forced their removal. I stopped them all. I even deleted the whole old ring
>> file before restarting.
>>
>> How can I fix this situations. What causes the above error?
>>
>> Thanks,
>> Rebecca
>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Joseph Blomstedt <joe at basho.com>
Software Engineer
Basho Technologies, Inc.
http://www.basho.com/




More information about the riak-users mailing list