Java Riak client can't handle a Riak node failure?

Vanessa Williams vanessa.williams at
Wed Oct 7 08:45:29 EDT 2015

Hi Dmitri, thanks for the quick reply.

It was actually our sysadmin who tried the load balancer approach and had
no success, late last evening. However I haven't discussed the gory details
with him yet. The failure he saw was at the application level (i.e. failure
to read a key), but I don't know a) how he set up the LB or b) what the
Java exception was, if any. I'll find that out in an hour or two and report

I did find this article just now:

So I suppose we'll give those suggestions a try this morning.

What is the drawback to having the client connect to all 4 nodes (the
cluster client, I assume you mean?) My understanding from reading articles
I've found is that one of the nodes going away causes that client to fail
as well. Is that what you mean, or are there other drawbacks as well?

If there's anything else you can recommend, or links other than the one
above you can point me to, it would be much appreciated. We expect both
node failure and deliberate node removal for upgrade, repair, replacement,


On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin <dzagidulin at>

> Hi Vanessa,
> Riak is definitely meant to run behind a load balancer. (Or, at the worst
> case, to be load-balanced on the client side. That is, all clients connect
> to all 4 nodes).
> When you say "we did try putting all 4 Riak nodes behind a load-balancer
> and pointing the clients at it, but it didn't help." -- what do you mean
> exactly, by "it didn't help"? What happened when you tried using the load
> balancer?
> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <
> vanessa.williams at> wrote:
>> Hi all, we are still (for a while longer) using Riak 1.4 and the matching
>> Java client. The client(s) connect to one node in the cluster (since that's
>> all it can do in this client version). The cluster itself has 4 nodes
>> (sorry, we can't use 5 in this scenario). There are 2 separate clients.
>> We've tried both n_val = 3 and n_val=4. We achieve consistency-by-writes
>> by setting w=all. Therefore, we only require one successful read (r=1).
>> When all nodes are up, everything is fine. If one node fails, the clients
>> can no longer read any keys at all. There's an exception like this:
>> com.basho.riak.client.RiakRetryFailedException:
>> Connection refused
>> Now, it isn't possible that Riak can't operate when one node fails, so
>> we're clearly missing something here.
>> Note: we did try putting all 4 Riak nodes behind a load-balancer and
>> pointing the clients at it, but it didn't help.
>> Riak is a high-availability key-value store, so... why are we failing to
>> achieve high-availability? Any suggestions greatly appreciated, and if more
>> info is required I'll do my best to provide it.
>> Thanks in advance,
>> Vanessa
>> --
>> Vanessa Williams
>> ThoughtWire Corporation
>> _______________________________________________
>> riak-users mailing list
>> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list