Riak-CS issues when Riak endpoint fails-over to new server
mkessler at basho.com
Wed Jan 4 05:42:42 EST 2017
As far as I know Riak CS has none of the more advanced retry capabilities
that Riak KV has. However, in the design of CS there seems to be an
assumption that a CS instance will talk to a co-located KV node on the same
host. To achieve high availability, in CS deployments HAProxy is often
deployed in front of the CS nodes. Could you please let me know if this is
an option for your setup?
On 4 January 2017 at 01:04, Toby Corkindale <toby at dryft.net> wrote:
> Hello all,
> Now that we're all back from the end-of-year holidays, I'd like to bump
> this question up.
> I feel like this has been a long-standing problem with Riak CS not
> handling dropped TCP connections.
> Last time the cause was haproxy dropping idle TCP connections after too
> long, but we solved that at the haproxy end.
> This time, it's harder -- we're failing over to a different Riak backend,
> so the TCP connections between Riak CS and Riak PBC *have* to go down, but
> Riak CS just doesn't handle it well at all.
> Is there a trick to configuring it better?
> On Thu, 22 Dec 2016 at 16:48 Toby Corkindale <toby at dryft.net> wrote:
>> We've been seeing some issues with Riak CS for a while in a specific
>> situation. Maybe you can advise if we're doing something wrong?
>> Our setup has redundant haproxy instances in front of a cluster of riak
>> nodes, for both HTTP and PBC. The haproxy instances share a floating IP
>> Only one node holds the IP, but if it goes down, another takes it up.
>> Our Riak CS nodes are configured to talk to the haproxy on that floating
>> The problem occurs if the floating IP moves from one haproxy to another.
>> Suddenly we see a flurry of errors in riak-cs log files.
>> This is presumably because it was holding open TCP connections, and the
>> new haproxy instance doesn't know anything about them, so they get TCP
>> RESET and shutdown.
>> The problem is that riak-cs doesn't try to reconnect and retry
>> immediately, instead it just throws a 503 error back to the client. Who
>> then retries, but Riak-CS has a pool of a couple of hundred connections to
>> cycle through, all of which throw the error!
>> Does this sound like it is a likely description of the fault?
>> Do you have any ways to mitigate this issue in Riak CS when using TCP
>> load balancing above Riak PBC?
> riak-users mailing list
> riak-users at lists.basho.com
Client Services Engineer
Basho Technologies Limited
Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users