Java Riak client can't handle a Riak node failure?

Alexander Sicular siculars at
Thu Oct 8 13:04:15 EDT 2015

Greetings and salutations, Vanessa.

I am obliged to point out that running n_val of 3 (or more) is highly
detrimental in a cluster size smaller than 5 nodes. Needless to say, it is
not recommended. Let's talk about why that is the case for a moment.

The default level of abstraction in Riak is the virtual node, or vnode. The
vnode represents a segment of the ring (ring_size, configurable in steps of
power of 2, default 64, min 8, max 1024.) The "ring" is the number line 0 -
2^160 which represents the output of the SHA hash.

Riak achieves high availability through replication. Riak replicates data
to vnodes. A hypothetical replica set may be, for example, set[vnode 1,
vnode 10, vnode 20]. Note, I said vnodes. Not physical nodes. And therein
lies the concern. Considering a default ring size of 64 and a default
replica count of 3, the minimum recommended production deployment of a Riak
cluster should be 5 due to the fact that in that circumstance every replica
set combination is guaranteed to have each vnode on a distinct physical
node. Anything less than that will certainly have some combinations of
replica sets which have two of its copies on the same physical host. Note I
said some combinations. Some fraction of node replica set combinations will
have two of their copies allocated to one physical machine.

You can see where I'm going with this. Firstly, performance will be
negatively impacted when writing more than one copy of data to the same
physical hardware, aka disk. But more importantly, you are negating Riak's
high availability mechanic. If you lost any given physical node you would
lose access to two copies of the set of data which had two replicas on that

Riak is designed to withstand loss of any two physical nodes while
maintaining access to 100% of your corpus assuming the fact that you are
running the default settings and have deployed 5 nodes.

Here is the rule of thumb that I recommend (me personally, not Basho) to
folks looking to deploy clusters with less than 5 nodes:

1,2 nodes: n_val 1
3,4 nodes: n_val 2
5+ nodes: n_val 3

In summary, please consider reconfiguring your production deployment.



Sent from my iRotaryPhone

On Oct 7, 2015, at 19:56, Vanessa Williams <vanessa.williams at>

Hi Dmitri, what would be the benefit of r=2, exactly? It isn't necessary to
trigger read-repair, is it? If it's important I'd rather try it sooner than


On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin <dzagidulin at>

> Glad you sorted it out!
> (I do want to encourage you to bump your R setting to at least 2, though.
> Run some tests -- I think you'll find that the difference in speed will not
> be noticeable, but you do get a lot more data resilience with 2.)
> On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
> vanessa.williams at> wrote:
>> Hi Dmitri, well...we solved our problem to our satisfaction but it turned
>> out to be something unexpected.
>> The keys were two properties mentioned in a blog post on "configuring
>> Riak’s oft-subtle behavioral characteristics":
>> notfound_ok= false
>> basic_quorum=true
>> The 2nd one just makes things a little faster, but the first one is the
>> one whose default value of true was killing us.
>> With r=1 and notfound_ok=true (default) the first node to respond, if it
>> didn't find the requested key, the authoritative answer was "this key is
>> not found". Not what we were expecting at all.
>> With the changed settings, it will wait for a quorum of responses and
>> only if *no one* finds the key will "not found" be returned. Perfect.
>> (Without this setting it would wait for all responses, not ideal.)
>> Now there is only one snag, which is that if the Riak node the client
>> connects to goes down, there will be no communication and we have a
>> problem. This is easily solvable with a load-balancer, though for
>> complicated reasons we actually don't need to do that right now. It's just
>> acceptable for us temporarily. Later, we'll get the load-balancer working
>> and even that won't be a problem.
>> I *think* we're ok now. Thanks for your help!
>> Regards,
>> Vanessa
>> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin <dzagidulin at>
>> wrote:
>>> Yeah, definitely find out what the sysadmin's experience was, with the
>>> load balancer. It could have just been a wrong configuration or something.
>>> And yes, that's the documentation page I recommend -
>>> Just set up HAProxy, and point your Java clients to its IP.
>>> The drawbacks to load-balancing on the java client side (yes, the
>>> cluster object) instead of a standalone load balancer like HAProxy, are the
>>> following:
>>> 1) Adding node means code changes (or at very least, config file
>>> changes) rolled out to all your clients. Which turns out to be a pretty
>>> serious hassle. Instead, HAProxy allows you to add or remove nodes without
>>> changing any java code or config files.
>>> 2) Performance. We've ran many tests to compare performance, and
>>> client-side load balancing results in significantly lower throughput than
>>> you'd have using haproxy (or nginx). (Specifically, you actually want to
>>> use the 'leastconn' load balancing algorithm with HAProxy, instead of round
>>> robin).
>>> 3) The health check on the client side (so that the java load balancer
>>> can tell when a remote node is down) is much less intelligent than a
>>> dedicated load balancer would provide. With something like HAProxy, you
>>> should be able to take down nodes with no ill effects for the client code.
>>> Now, if you load balance on the client side and you take a node down,
>>> it's not supposed to stop working completely. (I'm not sure why it's
>>> failing for you, we can investigate, but it'll be easier to just use a load
>>> balancer). It should throw an error or two, but then start working again
>>> (on the retry).
>>> Dmitri
>>> On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams <
>>> vanessa.williams at> wrote:
>>>> Hi Dmitri, thanks for the quick reply.
>>>> It was actually our sysadmin who tried the load balancer approach and
>>>> had no success, late last evening. However I haven't discussed the gory
>>>> details with him yet. The failure he saw was at the application level (i.e.
>>>> failure to read a key), but I don't know a) how he set up the LB or b) what
>>>> the Java exception was, if any. I'll find that out in an hour or two and
>>>> report back.
>>>> I did find this article just now:
>>>> So I suppose we'll give those suggestions a try this morning.
>>>> What is the drawback to having the client connect to all 4 nodes (the
>>>> cluster client, I assume you mean?) My understanding from reading articles
>>>> I've found is that one of the nodes going away causes that client to fail
>>>> as well. Is that what you mean, or are there other drawbacks as well?
>>>> If there's anything else you can recommend, or links other than the one
>>>> above you can point me to, it would be much appreciated. We expect both
>>>> node failure and deliberate node removal for upgrade, repair, replacement,
>>>> etc.
>>>> Regards,
>>>> Vanessa
>>>> On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin <dzagidulin at>
>>>> wrote:
>>>>> Hi Vanessa,
>>>>> Riak is definitely meant to run behind a load balancer. (Or, at the
>>>>> worst case, to be load-balanced on the client side. That is, all clients
>>>>> connect to all 4 nodes).
>>>>> When you say "we did try putting all 4 Riak nodes behind a
>>>>> load-balancer and pointing the clients at it, but it didn't help." -- what
>>>>> do you mean exactly, by "it didn't help"? What happened when you tried
>>>>> using the load balancer?
>>>>> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <
>>>>> vanessa.williams at> wrote:
>>>>>> Hi all, we are still (for a while longer) using Riak 1.4 and the
>>>>>> matching Java client. The client(s) connect to one node in the cluster
>>>>>> (since that's all it can do in this client version). The cluster itself has
>>>>>> 4 nodes (sorry, we can't use 5 in this scenario). There are 2 separate
>>>>>> clients.
>>>>>> We've tried both n_val = 3 and n_val=4. We achieve
>>>>>> consistency-by-writes by setting w=all. Therefore, we only require one
>>>>>> successful read (r=1).
>>>>>> When all nodes are up, everything is fine. If one node fails, the
>>>>>> clients can no longer read any keys at all. There's an exception like this:
>>>>>> com.basho.riak.client.RiakRetryFailedException:
>>>>>> Connection refused
>>>>>> Now, it isn't possible that Riak can't operate when one node fails,
>>>>>> so we're clearly missing something here.
>>>>>> Note: we did try putting all 4 Riak nodes behind a load-balancer and
>>>>>> pointing the clients at it, but it didn't help.
>>>>>> Riak is a high-availability key-value store, so... why are we failing
>>>>>> to achieve high-availability? Any suggestions greatly appreciated, and if
>>>>>> more info is required I'll do my best to provide it.
>>>>>> Thanks in advance,
>>>>>> Vanessa
>>>>>> --
>>>>>> Vanessa Williams
>>>>>> ThoughtWire Corporation
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at
riak-users mailing list
riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list