Whole cluster times out if one node is gone

Alexander Sicular siculars at gmail.com
Mon Nov 29 13:42:41 EST 2010


You may have mentioned which client you are using (the thread is deep
already) but I would think that this is a client implementation
problem. As in some sort of connection pooling thing. Try calling curl
from a sleep loop in a shell script and see what happens.

-Alexander

On Mon, Nov 29, 2010 at 13:27, Jay Adkisson <j4yferd at gmail.com> wrote:
> Hm, that's curious.  Are you rebooting the physical machine?  When you
> reboot one of the nodes, what happens to HTTP calls to that node?  Do they
> immediately error, or do they hang indefinitely?
> In the meanwhile, I'll add some logging so I can see whether I'm timing out
> on the writes as well, and I'll see what happens with different keys.
> Thanks,
> --Jay
>
> On Mon, Nov 29, 2010 at 10:02 AM, Dan Reverri <dan at basho.com> wrote:
>>
>> Hi Jay,
>> I'm not able to reproduce the behavior you are seeing. Here is what I am
>> doing to try to reproduce the issue:
>> 1. Setup a 4 node cluster
>> 2. Continuously write a new object to Riak every 0.5 second
>> 3. Continuously read a known object (GET riak/test/1) from Riak every 0.5
>> second
>> 4. Reboot one of the nodes
>> The reads and writes continue working normally when rebooting the node.
>> Do you see timeouts while writing objects to Riak?
>> Can you try reading other objects from Riak during the reboot (i.e.
>> different keys)?
>> Thanks,
>> Dan
>> Daniel Reverri
>> Developer Advocate
>> Basho Technologies, Inc.
>> dan at basho.com
>>
>>
>> On Mon, Nov 29, 2010 at 9:39 AM, Jay Adkisson <j4yferd at gmail.com> wrote:
>>>
>>> Hey Dan/Sean,
>>>
>>> Thanks for the response.  sasl-error.log on node A is completely empty,
>>> and I see this pattern in erlang.log:
>>> ===== ALIVE Tue Nov 23 12:46:57 PST 2010
>>> ===== Tue Nov 23 12:57:36 PST 2010
>>> =ERROR REPORT==== 23-Nov-2010::12:57:36 ===
>>> ** Node 'riak@<node D>' not responding **
>>> ** Removing (timedout) connection **
>>> =INFO REPORT==== 23-Nov-2010::12:58:41 ===
>>> Starting handoff of partition riak_kv_vnode
>>> 251195593916248939066258330623111144003363405824 to 'riak@<node D>'
>>> =INFO REPORT==== 23-Nov-2010::12:58:41 ===
>>> Handoff of partition riak_kv_vnode
>>> 251195593916248939066258330623111144003363405824 to 'riak@<node D>'
>>> completed: sent 1 objects in 0.02 seconds
>>> =INFO REPORT==== 23-Nov-2010::12:59:18 ===
>>> Starting handoff of partition riak_kv_vnode
>>> 707914855582156101004909840846949587645842325504 to 'riak@<node D>'
>>> =INFO REPORT==== 23-Nov-2010::12:59:18 ===
>>> Handoff of partition riak_kv_vnode
>>> 707914855582156101004909840846949587645842325504 to 'riak@<node D>'
>>> completed: sent 5 objects in 0.03 seconds
>>> =INFO REPORT==== 23-Nov-2010::12:59:20 ===
>>> Starting handoff of partition riak_kv_vnode
>>> 525227150915793236229449236757414210188850757632 to 'riak@<node D>'
>>> <handoffs, etc...>
>>> This is my testing process: I'm doing an initial load into riak of small
>>> image files between 1 and 150K, throttled to two images per second, with
>>> W=1.  In a different terminal, I'm running a wget every second against node
>>> A of one particular image I already know to be in the cluster, again with
>>> R=1.  I'm using R,W=1 because I figured that would reduce the chance of
>>> timing out, and with my data pattern, nothing I write to the cluster will
>>> ever change, so I really don't need to wait for a quorum.
>>> In response to Sean,
>>>>
>>>> 1) Riak detects node outage the same way any Erlang system does - when a
>>>> message fails to deliver, or the heartbeat maintained by epmd fails.  The
>>>> default timeout in epmd is 1 minute, which is probably why you're seeing it
>>>> take 1 minute to be detected.
>>>
>>> Thanks, this is enlightening.
>>>>
>>>> 2) If it takes too long (the vnode is overloaded, perhaps, or is just
>>>> starting up as a hint partition) to retrieve from any node, the request can
>>>> time out.
>>>
>>> That makes sense, but I still wonder why this happens even when the
>>> quorum is already met by the machines that are responding normally?
>>>
>>>>
>>>> 3) You could probably configure epmd to timeout sooner, but then you
>>>> become more vulnerable to temporary partitions. YMMV
>>>
>>> I may try that - it might be a good fit with my data pattern.
>>> Thanks again,
>>> --Jay
>>>
>>> On Mon, Nov 29, 2010 at 4:44 AM, David Smith <dizzyd at basho.com> wrote:
>>>>
>>>> On Tue, Nov 23, 2010 at 3:33 PM, Jay Adkisson <j4yferd at gmail.com> wrote:
>>>> > (many profuse apologies to Dan - hit "reply" instead of "reply all")
>>>> > Alrighty, I've done a little more digging.  When I throttle the writes
>>>> > heavily (2/sec) and set R and W to 1 all around, the cluster works
>>>> > just fine
>>>> > after I restart the node for about 15-20 seconds.  Then the read
>>>> > request
>>>> > hangs for about a minute, until node D disappears from connected_nodes
>>>> > in
>>>> > riak-admin status, at which point it returns the desired value
>>>> > (although
>>>> > sometimes I get a 503):
>>>>
>>>> Are you seeing any error messages in log/erlang.log.* or
>>>> log/sasl-error.log?
>>>>
>>>> Can you expound on your use case a little -- are you doing a large
>>>> insert, or just a random read/write mix? Did you pre-populate the
>>>> dataset? Why are you using r=1, instead of relying on quorom for
>>>> reads?
>>>>
>>>> How are you running the riak-admin status to measure the 15-20 seconds?
>>>>
>>>> Thanks.
>>>>
>>>> D.
>>>
>>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>




More information about the riak-users mailing list