Java Riak client can't handle a Riak node failure?

Dmitri Zagidulin dzagidulin at basho.com
Wed Oct 7 09:33:52 EDT 2015


Yeah, definitely find out what the sysadmin's experience was, with the load
balancer. It could have just been a wrong configuration or something.

And yes, that's the documentation page I recommend -
http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
Just set up HAProxy, and point your Java clients to its IP.

The drawbacks to load-balancing on the java client side (yes, the cluster
object) instead of a standalone load balancer like HAProxy, are the
following:

1) Adding node means code changes (or at very least, config file changes)
rolled out to all your clients. Which turns out to be a pretty serious
hassle. Instead, HAProxy allows you to add or remove nodes without changing
any java code or config files.

2) Performance. We've ran many tests to compare performance, and
client-side load balancing results in significantly lower throughput than
you'd have using haproxy (or nginx). (Specifically, you actually want to
use the 'leastconn' load balancing algorithm with HAProxy, instead of round
robin).

3) The health check on the client side (so that the java load balancer can
tell when a remote node is down) is much less intelligent than a dedicated
load balancer would provide. With something like HAProxy, you should be
able to take down nodes with no ill effects for the client code.

Now, if you load balance on the client side and you take a node down, it's
not supposed to stop working completely. (I'm not sure why it's failing for
you, we can investigate, but it'll be easier to just use a load balancer).
It should throw an error or two, but then start working again (on the
retry).

Dmitri

On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams <
vanessa.williams at thoughtwire.ca> wrote:

> Hi Dmitri, thanks for the quick reply.
>
> It was actually our sysadmin who tried the load balancer approach and had
> no success, late last evening. However I haven't discussed the gory details
> with him yet. The failure he saw was at the application level (i.e. failure
> to read a key), but I don't know a) how he set up the LB or b) what the
> Java exception was, if any. I'll find that out in an hour or two and report
> back.
>
> I did find this article just now:
>
>
> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>
> So I suppose we'll give those suggestions a try this morning.
>
> What is the drawback to having the client connect to all 4 nodes (the
> cluster client, I assume you mean?) My understanding from reading articles
> I've found is that one of the nodes going away causes that client to fail
> as well. Is that what you mean, or are there other drawbacks as well?
>
> If there's anything else you can recommend, or links other than the one
> above you can point me to, it would be much appreciated. We expect both
> node failure and deliberate node removal for upgrade, repair, replacement,
> etc.
>
> Regards,
> Vanessa
>
> On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin <dzagidulin at basho.com>
> wrote:
>
>> Hi Vanessa,
>>
>> Riak is definitely meant to run behind a load balancer. (Or, at the worst
>> case, to be load-balanced on the client side. That is, all clients connect
>> to all 4 nodes).
>>
>> When you say "we did try putting all 4 Riak nodes behind a load-balancer
>> and pointing the clients at it, but it didn't help." -- what do you mean
>> exactly, by "it didn't help"? What happened when you tried using the load
>> balancer?
>>
>>
>>
>> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <
>> vanessa.williams at thoughtwire.ca> wrote:
>>
>>> Hi all, we are still (for a while longer) using Riak 1.4 and the
>>> matching Java client. The client(s) connect to one node in the cluster
>>> (since that's all it can do in this client version). The cluster itself has
>>> 4 nodes (sorry, we can't use 5 in this scenario). There are 2 separate
>>> clients.
>>>
>>> We've tried both n_val = 3 and n_val=4. We achieve consistency-by-writes
>>> by setting w=all. Therefore, we only require one successful read (r=1).
>>>
>>> When all nodes are up, everything is fine. If one node fails, the
>>> clients can no longer read any keys at all. There's an exception like this:
>>>
>>> com.basho.riak.client.RiakRetryFailedException:
>>> java.net.ConnectException: Connection refused
>>>
>>> Now, it isn't possible that Riak can't operate when one node fails, so
>>> we're clearly missing something here.
>>>
>>> Note: we did try putting all 4 Riak nodes behind a load-balancer and
>>> pointing the clients at it, but it didn't help.
>>>
>>> Riak is a high-availability key-value store, so... why are we failing to
>>> achieve high-availability? Any suggestions greatly appreciated, and if more
>>> info is required I'll do my best to provide it.
>>>
>>> Thanks in advance,
>>> Vanessa
>>>
>>> --
>>> Vanessa Williams
>>> ThoughtWire Corporation
>>> http://www.thoughtwire.com
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20151007/8591e3ed/attachment-0002.html>


More information about the riak-users mailing list