Whole cluster times out if one node is gone

Jay Adkisson j4yferd at gmail.com
Mon Nov 29 13:27:37 EST 2010


Hm, that's curious.  Are you rebooting the physical machine?  When you
reboot one of the nodes, what happens to HTTP calls to that node?  Do they
immediately error, or do they hang indefinitely?

In the meanwhile, I'll add some logging so I can see whether I'm timing out
on the writes as well, and I'll see what happens with different keys.

Thanks,
--Jay

On Mon, Nov 29, 2010 at 10:02 AM, Dan Reverri <dan at basho.com> wrote:

> Hi Jay,
>
> I'm not able to reproduce the behavior you are seeing. Here is what I am
> doing to try to reproduce the issue:
> 1. Setup a 4 node cluster
> 2. Continuously write a new object to Riak every 0.5 second
> 3. Continuously read a known object (GET riak/test/1) from Riak every 0.5
> second
> 4. Reboot one of the nodes
>
> The reads and writes continue working normally when rebooting the node.
>
> Do you see timeouts while writing objects to Riak?
> Can you try reading other objects from Riak during the reboot (i.e.
> different keys)?
>
> Thanks,
> Dan
>
> Daniel Reverri
> Developer Advocate
> Basho Technologies, Inc.
> dan at basho.com
>
>
> On Mon, Nov 29, 2010 at 9:39 AM, Jay Adkisson <j4yferd at gmail.com> wrote:
>
>> Hey Dan/Sean,
>>
>> Thanks for the response.  sasl-error.log on node A is completely empty,
>> and I see this pattern in erlang.log:
>>
>> ===== ALIVE Tue Nov 23 12:46:57 PST 2010
>>
>> ===== Tue Nov 23 12:57:36 PST 2010
>>
>> =ERROR REPORT==== 23-Nov-2010::12:57:36 ===
>>  ** Node 'riak@<node D>' not responding **
>> ** Removing (timedout) connection **
>>
>> =INFO REPORT==== 23-Nov-2010::12:58:41 ===
>> Starting handoff of partition riak_kv_vnode
>> 251195593916248939066258330623111144003363405824 to 'riak@<node D>'
>>
>> =INFO REPORT==== 23-Nov-2010::12:58:41 ===
>> Handoff of partition riak_kv_vnode
>> 251195593916248939066258330623111144003363405824 to 'riak@<node D>'
>> completed: sent 1 objects in 0.02 seconds
>> =INFO REPORT==== 23-Nov-2010::12:59:18 ===
>> Starting handoff of partition riak_kv_vnode
>> 707914855582156101004909840846949587645842325504 to 'riak@<node D>'
>>
>> =INFO REPORT==== 23-Nov-2010::12:59:18 ===
>> Handoff of partition riak_kv_vnode
>> 707914855582156101004909840846949587645842325504 to 'riak@<node D>'
>> completed: sent 5 objects in 0.03 seconds
>> =INFO REPORT==== 23-Nov-2010::12:59:20 ===
>> Starting handoff of partition riak_kv_vnode
>> 525227150915793236229449236757414210188850757632 to 'riak@<node D>'
>>
>> <handoffs, etc...>
>>
>> This is my testing process: I'm doing an initial load into riak of small
>> image files between 1 and 150K, throttled to two images per second, with
>> W=1.  In a different terminal, I'm running a wget every second against node
>> A of one particular image I already know to be in the cluster, again with
>> R=1.  I'm using R,W=1 because I figured that would reduce the chance of
>> timing out, and with my data pattern, nothing I write to the cluster will
>> ever change, so I really don't need to wait for a quorum.
>>
>> In response to Sean,
>>
>>> 1) Riak detects node outage the same way any Erlang system does - when a
>>> message fails to deliver, or the heartbeat maintained by epmd fails.  The
>>> default timeout in epmd is 1 minute, which is probably why you're seeing it
>>> take 1 minute to be detected.
>>>
>> Thanks, this is enlightening.
>>
>> 2) If it takes too long (the vnode is overloaded, perhaps, or is just
>>> starting up as a hint partition) to retrieve from any node, the request can
>>> time out.
>>>
>> That makes sense, but I still wonder why this happens even when the quorum
>> is already met by the machines that are responding normally?
>>
>>
>>> 3) You could probably configure epmd to timeout sooner, but then you
>>> become more vulnerable to temporary partitions. YMMV
>>>
>> I may try that - it might be a good fit with my data pattern.
>>
>> Thanks again,
>> --Jay
>>
>>
>> On Mon, Nov 29, 2010 at 4:44 AM, David Smith <dizzyd at basho.com> wrote:
>>
>>> On Tue, Nov 23, 2010 at 3:33 PM, Jay Adkisson <j4yferd at gmail.com> wrote:
>>> > (many profuse apologies to Dan - hit "reply" instead of "reply all")
>>> > Alrighty, I've done a little more digging.  When I throttle the writes
>>> > heavily (2/sec) and set R and W to 1 all around, the cluster works just
>>> fine
>>> > after I restart the node for about 15-20 seconds.  Then the read
>>> request
>>> > hangs for about a minute, until node D disappears from connected_nodes
>>> in
>>> > riak-admin status, at which point it returns the desired value
>>> (although
>>> > sometimes I get a 503):
>>>
>>> Are you seeing any error messages in log/erlang.log.* or
>>> log/sasl-error.log?
>>>
>>> Can you expound on your use case a little -- are you doing a large
>>> insert, or just a random read/write mix? Did you pre-populate the
>>> dataset? Why are you using r=1, instead of relying on quorom for
>>> reads?
>>>
>>> How are you running the riak-admin status to measure the 15-20 seconds?
>>>
>>> Thanks.
>>>
>>> D.
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101129/dd8c9cc6/attachment.html>


More information about the riak-users mailing list