Java Riak client can't handle a Riak node failure?

Vanessa Williams vanessa.williams at thoughtwire.ca
Mon Feb 22 16:21:34 EST 2016


Thanks very much for the advice. I'll give it a good test and then write
something. Somewhere. Cheers.

On Mon, Feb 22, 2016 at 3:42 PM, Alex Moore <amoore at basho.com> wrote:

> If the contract is "Return true iff the object existed", then the second
> fetch is superfluous + so is the async example I posted.  You can use the
> code you had as-is.
>
> Thanks,
> Alex
>
> On Mon, Feb 22, 2016 at 1:23 PM, Vanessa Williams <
> vanessa.williams at thoughtwire.ca> wrote:
>
>> Hi Alex, would a second fetch just indicate that the object is *still*
>> deleted? Or that this delete operation succeeded? In other words, perhaps
>> what my contract really is is: return true if there was already a value
>> there. In which case would the second fetch be superfluous?
>>
>> Thanks for your help.
>>
>> Vanessa
>>
>> On Mon, Feb 22, 2016 at 11:15 AM, Alex Moore <amoore at basho.com> wrote:
>>
>>> That's the correct behaviour: it should return true iff a value was
>>>> actually deleted.
>>>
>>>
>>> Ok, if that's the case you should do another FetchValue after the
>>> deletion (to update the response.hasValues()) field, or use the async
>>> version of the delete function. I also noticed that we weren't passing the
>>> vclock to the Delete function, so I added that here as well:
>>>
>>> public boolean delete(String key) throws ExecutionException, InterruptedException {
>>>
>>>     // fetch in order to get the causal context
>>>     FetchValue.Response response = fetchValue(key);
>>>
>>>     if(response.isNotFound())
>>>     {
>>>         return ???; // what do we return if it doesn't exist?
>>>     }
>>>
>>>     DeleteValue deleteValue = new DeleteValue.Builder(new Location(namespace, key))
>>>                                              .withVClock(response.getVectorClock())
>>>                                              .build();
>>>
>>>     final RiakFuture<Void, Location> deleteFuture = client.executeAsync(deleteValue);
>>>
>>>     deleteFuture.await();
>>>
>>>     if(deleteFuture.isSuccess())
>>>     {
>>>         return true;
>>>     }
>>>     else
>>>     {
>>>         deleteFuture.cause(); // Cause of failure
>>>         return false;
>>>     }
>>> }
>>>
>>>
>>> Thanks,
>>> Alex
>>>
>>> On Mon, Feb 22, 2016 at 10:48 AM, Vanessa Williams <
>>> vanessa.williams at thoughtwire.ca> wrote:
>>>
>>>> See inline:
>>>>
>>>> On Mon, Feb 22, 2016 at 10:31 AM, Alex Moore <amoore at basho.com> wrote:
>>>>
>>>>> Hi Vanessa,
>>>>>
>>>>> You might have a problem with your delete function (depending on it's
>>>>> return value).
>>>>> What does the return value of the delete() function indicate?  Right
>>>>> now if an object existed, and was deleted, the function will return true,
>>>>> and will only return false if the object didn't exist or only consisted of
>>>>> tombstones.
>>>>>
>>>>
>>>>
>>>> That's the correct behaviour: it should return true iff a value was
>>>> actually deleted.
>>>>
>>>>
>>>>> If you never look at the object value returned by your fetchValue(key) function, another potential optimization you could make is to only return the HEAD / metadata:
>>>>>
>>>>> FetchValue fv = new FetchValue.Builder(new Location(new Namespace(
>>>>> "some_bucket"), key))
>>>>>
>>>>>                               .withOption(FetchValue.Option.HEAD, true)
>>>>>                               .build();
>>>>>
>>>>> This would be more efficient because Riak won't have to send you the
>>>>> values over the wire, if you only need the metadata.
>>>>>
>>>>>
>>>> Thanks, I'll clean that up.
>>>>
>>>>
>>>>> If you do write this up somewhere, share the link! :)
>>>>>
>>>>
>>>> Will do!
>>>>
>>>> Regards,
>>>> Vanessa
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Alex
>>>>>
>>>>>
>>>>> On Mon, Feb 22, 2016 at 6:23 AM, Vanessa Williams <
>>>>> vanessa.williams at thoughtwire.ca> wrote:
>>>>>
>>>>>> Hi Dmitri, this thread is old, but I read this part of your answer
>>>>>> carefully:
>>>>>>
>>>>>> You can use the following strategies to prevent stale values, in
>>>>>>> increasing order of security/preference:
>>>>>>> 1) Use timestamps (and not pass in vector clocks/causal context).
>>>>>>> This is ok if you're not editing objects, or you're ok with a bit of risk
>>>>>>> of stale values.
>>>>>>> 2) Use causal context correctly (which means, read-before-you-write
>>>>>>> -- in fact, the Update operation in the java client does this for you, I
>>>>>>> think). And if Riak can't determine which version is correct, it will fall
>>>>>>> back on timestamps.
>>>>>>> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
>>>>>>> will still try to use causal context to decide the right value. But if it
>>>>>>> can't decide, it will store BOTH values, and give them back to you on the
>>>>>>> next read, so that your application can decide which is the correct one.
>>>>>>
>>>>>>
>>>>>> I decided on strategy #2. What I am hoping for is some validation
>>>>>> that the code we use to "get", "put", and "delete" is correct in that
>>>>>> context, or if it could be simplified in some cases. Not we are using
>>>>>> delete-mode "immediate" and no duplicates.
>>>>>>
>>>>>> In their shortest possible forms, here are the three methods I'd like
>>>>>> some feedback on (note, they're being used in production and haven't caused
>>>>>> any problems yet, however we have very few writes in production so the lack
>>>>>> of problems doesn't support the conclusion that the implementation is
>>>>>> correct.) Note all argument-checking, exception-handling, and logging
>>>>>> removed for clarity. *I'm mostly concerned about correct use of
>>>>>> causal context and response.isNotFound and response.hasValues. *Is
>>>>>> there anything I could/should have left out?
>>>>>>
>>>>>>     public RiakClient(String name,
>>>>>> com.basho.riak.client.api.RiakClient client)
>>>>>>     {
>>>>>>         this.name = name;
>>>>>>         this.namespace = new Namespace(name);
>>>>>>         this.client = client;
>>>>>>     }
>>>>>>
>>>>>>     public byte[] get(String key) throws ExecutionException,
>>>>>> InterruptedException {
>>>>>>
>>>>>>         FetchValue.Response response = fetchValue(key);
>>>>>>         if (!response.isNotFound())
>>>>>>         {
>>>>>>             RiakObject riakObject =
>>>>>> response.getValue(RiakObject.class);
>>>>>>             return riakObject.getValue().getValue();
>>>>>>         }
>>>>>>         return null;
>>>>>>     }
>>>>>>
>>>>>>     public void put(String key, byte[] value) throws
>>>>>> ExecutionException, InterruptedException {
>>>>>>
>>>>>>         // fetch in order to get the causal context
>>>>>>         FetchValue.Response response = fetchValue(key);
>>>>>>         RiakObject storeObject = new
>>>>>>
>>>>>> RiakObject().setValue(BinaryValue.create(value)).setContentType("binary/octet-stream");
>>>>>>         StoreValue.Builder builder =
>>>>>>             new StoreValue.Builder(storeObject).withLocation(new
>>>>>> Location(namespace, key));
>>>>>>         if (response.getVectorClock() != null) {
>>>>>>             builder =
>>>>>> builder.withVectorClock(response.getVectorClock());
>>>>>>         }
>>>>>>         StoreValue storeValue = builder.build();
>>>>>>         client.execute(storeValue);
>>>>>>     }
>>>>>>
>>>>>>     public boolean delete(String key) throws ExecutionException,
>>>>>> InterruptedException {
>>>>>>
>>>>>>         // fetch in order to get the causal context
>>>>>>         FetchValue.Response response = fetchValue(key);
>>>>>>         if (!response.isNotFound())
>>>>>>         {
>>>>>>             DeleteValue deleteValue = new DeleteValue.Builder(new
>>>>>> Location(namespace, key)).build();
>>>>>>             client.execute(deleteValue);
>>>>>>         }
>>>>>>         return !response.isNotFound() || !response.hasValues();
>>>>>>     }
>>>>>>
>>>>>>
>>>>>> Any comments much appreciated. I want to provide a minimally correct
>>>>>> example of simple client code somewhere (GitHub, blog post, something...)
>>>>>> so I don't want to post this without review.
>>>>>>
>>>>>> Thanks,
>>>>>> Vanessa
>>>>>>
>>>>>> ThoughtWire Corporation
>>>>>> http://www.thoughtwire.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 8, 2015 at 8:45 AM, Dmitri Zagidulin <
>>>>>> dzagidulin at basho.com> wrote:
>>>>>>
>>>>>>> Hi Vanessa,
>>>>>>>
>>>>>>> The thing to keep in mind about read repair is -- it happens
>>>>>>> asynchronously on every GET, but /after/ the results are returned to the
>>>>>>> client.
>>>>>>>
>>>>>>> So, when you issue a GET with r=1, the coordinating node only waits
>>>>>>> for 1 of the replicas before responding to the client with a success, and
>>>>>>> only afterwards triggers read-repair.
>>>>>>>
>>>>>>> It's true that with notfound_ok=false, it'll wait for the first
>>>>>>> non-missing replica before responding. But if you edit or update your
>>>>>>> objects at all, an R=1 still gives you a risk of stale values being
>>>>>>> returned.
>>>>>>>
>>>>>>> For example, say you write an object with value A.  And let's say
>>>>>>> your 3 replicas now look like this:
>>>>>>>
>>>>>>> replica 1: A,  replica 2: A, replica 3: notfound/missing
>>>>>>>
>>>>>>> A read with an R=1 and notfound_ok=false is just fine, here.
>>>>>>> (Chances are, the notfound replica will arrive first, but the notfound_ok
>>>>>>> setting will force the coordinator to wait for the first non-empty value,
>>>>>>> A, and return it to the client. And then trigger read-repair).
>>>>>>>
>>>>>>> But what happens if you edit that same object, and give it a new
>>>>>>> value, B?  So, now, there's a chance that your replicas will look like this:
>>>>>>>
>>>>>>> replica 1: A, replica 2: B, replica 3: B.
>>>>>>>
>>>>>>> So now if you do a read with an R=1, there's a chance that replica
>>>>>>> 1, with the old value of A, will arrive first, and that's the response that
>>>>>>> will be returned to the client.
>>>>>>>
>>>>>>> Whereas, using R=2 eliminates that risk -- well, at least decreases
>>>>>>> it. You still have the issue of -- how does Riak decide whether A or B is
>>>>>>> the correct value? Are you using causal context/vclocks correctly? (That
>>>>>>> is, reading the object before you update, to get the correct causal
>>>>>>> context?) Or are you relying on timestamps? (This is an ok strategy,
>>>>>>> provided that the edits are sufficiently far apart in time, and you don't
>>>>>>> have many concurrent edits, AND you're ok with the small risk of
>>>>>>> occasionally the timestamp being wrong). You can use the following
>>>>>>> strategies to prevent stale values, in increasing order of
>>>>>>> security/preference:
>>>>>>>
>>>>>>> 1) Use timestamps (and not pass in vector clocks/causal context).
>>>>>>> This is ok if you're not editing objects, or you're ok with a bit of risk
>>>>>>> of stale values.
>>>>>>>
>>>>>>> 2) Use causal context correctly (which means, read-before-you-write
>>>>>>> -- in fact, the Update operation in the java client does this for you, I
>>>>>>> think). And if Riak can't determine which version is correct, it will fall
>>>>>>> back on timestamps.
>>>>>>>
>>>>>>> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
>>>>>>> will still try to use causal context to decide the right value. But if it
>>>>>>> can't decide, it will store BOTH values, and give them back to you on the
>>>>>>> next read, so that your application can decide which is the correct one.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Oct 8, 2015 at 1:56 AM, Vanessa Williams <
>>>>>>> vanessa.williams at thoughtwire.ca> wrote:
>>>>>>>
>>>>>>>> Hi Dmitri, what would be the benefit of r=2, exactly? It isn't
>>>>>>>> necessary to trigger read-repair, is it? If it's important I'd rather try
>>>>>>>> it sooner than later...
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Vanessa
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin <
>>>>>>>> dzagidulin at basho.com> wrote:
>>>>>>>>
>>>>>>>>> Glad you sorted it out!
>>>>>>>>>
>>>>>>>>> (I do want to encourage you to bump your R setting to at least 2,
>>>>>>>>> though. Run some tests -- I think you'll find that the difference in speed
>>>>>>>>> will not be noticeable, but you do get a lot more data resilience with 2.)
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
>>>>>>>>> vanessa.williams at thoughtwire.ca> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Dmitri, well...we solved our problem to our satisfaction but
>>>>>>>>>> it turned out to be something unexpected.
>>>>>>>>>>
>>>>>>>>>> The keys were two properties mentioned in a blog post on
>>>>>>>>>> "configuring Riak’s oft-subtle behavioral characteristics":
>>>>>>>>>> http://basho.com/posts/technical/riaks-config-behaviors-part-4/
>>>>>>>>>>
>>>>>>>>>> notfound_ok= false
>>>>>>>>>> basic_quorum=true
>>>>>>>>>>
>>>>>>>>>> The 2nd one just makes things a little faster, but the first one
>>>>>>>>>> is the one whose default value of true was killing us.
>>>>>>>>>>
>>>>>>>>>> With r=1 and notfound_ok=true (default) the first node to
>>>>>>>>>> respond, if it didn't find the requested key, the authoritative answer was
>>>>>>>>>> "this key is not found". Not what we were expecting at all.
>>>>>>>>>>
>>>>>>>>>> With the changed settings, it will wait for a quorum of responses
>>>>>>>>>> and only if *no one* finds the key will "not found" be returned. Perfect.
>>>>>>>>>> (Without this setting it would wait for all responses, not ideal.)
>>>>>>>>>>
>>>>>>>>>> Now there is only one snag, which is that if the Riak node the
>>>>>>>>>> client connects to goes down, there will be no communication and we have a
>>>>>>>>>> problem. This is easily solvable with a load-balancer, though for
>>>>>>>>>> complicated reasons we actually don't need to do that right now. It's just
>>>>>>>>>> acceptable for us temporarily. Later, we'll get the load-balancer working
>>>>>>>>>> and even that won't be a problem.
>>>>>>>>>>
>>>>>>>>>> I *think* we're ok now. Thanks for your help!
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Vanessa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin <
>>>>>>>>>> dzagidulin at basho.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yeah, definitely find out what the sysadmin's experience was,
>>>>>>>>>>> with the load balancer. It could have just been a wrong configuration or
>>>>>>>>>>> something.
>>>>>>>>>>>
>>>>>>>>>>> And yes, that's the documentation page I recommend -
>>>>>>>>>>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>>>>>>>>>>> Just set up HAProxy, and point your Java clients to its IP.
>>>>>>>>>>>
>>>>>>>>>>> The drawbacks to load-balancing on the java client side (yes,
>>>>>>>>>>> the cluster object) instead of a standalone load balancer like HAProxy, are
>>>>>>>>>>> the following:
>>>>>>>>>>>
>>>>>>>>>>> 1) Adding node means code changes (or at very least, config file
>>>>>>>>>>> changes) rolled out to all your clients. Which turns out to be a pretty
>>>>>>>>>>> serious hassle. Instead, HAProxy allows you to add or remove nodes without
>>>>>>>>>>> changing any java code or config files.
>>>>>>>>>>>
>>>>>>>>>>> 2) Performance. We've ran many tests to compare performance, and
>>>>>>>>>>> client-side load balancing results in significantly lower throughput than
>>>>>>>>>>> you'd have using haproxy (or nginx). (Specifically, you actually want to
>>>>>>>>>>> use the 'leastconn' load balancing algorithm with HAProxy, instead of round
>>>>>>>>>>> robin).
>>>>>>>>>>>
>>>>>>>>>>> 3) The health check on the client side (so that the java load
>>>>>>>>>>> balancer can tell when a remote node is down) is much less intelligent than
>>>>>>>>>>> a dedicated load balancer would provide. With something like HAProxy, you
>>>>>>>>>>> should be able to take down nodes with no ill effects for the client code.
>>>>>>>>>>>
>>>>>>>>>>> Now, if you load balance on the client side and you take a node
>>>>>>>>>>> down, it's not supposed to stop working completely. (I'm not sure why it's
>>>>>>>>>>> failing for you, we can investigate, but it'll be easier to just use a load
>>>>>>>>>>> balancer). It should throw an error or two, but then start working again
>>>>>>>>>>> (on the retry).
>>>>>>>>>>>
>>>>>>>>>>> Dmitri
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams <
>>>>>>>>>>> vanessa.williams at thoughtwire.ca> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Dmitri, thanks for the quick reply.
>>>>>>>>>>>>
>>>>>>>>>>>> It was actually our sysadmin who tried the load balancer
>>>>>>>>>>>> approach and had no success, late last evening. However I haven't discussed
>>>>>>>>>>>> the gory details with him yet. The failure he saw was at the application
>>>>>>>>>>>> level (i.e. failure to read a key), but I don't know a) how he set up the
>>>>>>>>>>>> LB or b) what the Java exception was, if any. I'll find that out in an hour
>>>>>>>>>>>> or two and report back.
>>>>>>>>>>>>
>>>>>>>>>>>> I did find this article just now:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>>>>>>>>>>>>
>>>>>>>>>>>> So I suppose we'll give those suggestions a try this morning.
>>>>>>>>>>>>
>>>>>>>>>>>> What is the drawback to having the client connect to all 4
>>>>>>>>>>>> nodes (the cluster client, I assume you mean?) My understanding from
>>>>>>>>>>>> reading articles I've found is that one of the nodes going away causes that
>>>>>>>>>>>> client to fail as well. Is that what you mean, or are there other drawbacks
>>>>>>>>>>>> as well?
>>>>>>>>>>>>
>>>>>>>>>>>> If there's anything else you can recommend, or links other than
>>>>>>>>>>>> the one above you can point me to, it would be much appreciated. We expect
>>>>>>>>>>>> both node failure and deliberate node removal for upgrade, repair,
>>>>>>>>>>>> replacement, etc.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Vanessa
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin <
>>>>>>>>>>>> dzagidulin at basho.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Vanessa,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Riak is definitely meant to run behind a load balancer. (Or,
>>>>>>>>>>>>> at the worst case, to be load-balanced on the client side. That is, all
>>>>>>>>>>>>> clients connect to all 4 nodes).
>>>>>>>>>>>>>
>>>>>>>>>>>>> When you say "we did try putting all 4 Riak nodes behind a
>>>>>>>>>>>>> load-balancer and pointing the clients at it, but it didn't help." -- what
>>>>>>>>>>>>> do you mean exactly, by "it didn't help"? What happened when you tried
>>>>>>>>>>>>> using the load balancer?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <
>>>>>>>>>>>>> vanessa.williams at thoughtwire.ca> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all, we are still (for a while longer) using Riak 1.4 and
>>>>>>>>>>>>>> the matching Java client. The client(s) connect to one node in the cluster
>>>>>>>>>>>>>> (since that's all it can do in this client version). The cluster itself has
>>>>>>>>>>>>>> 4 nodes (sorry, we can't use 5 in this scenario). There are 2 separate
>>>>>>>>>>>>>> clients.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We've tried both n_val = 3 and n_val=4. We achieve
>>>>>>>>>>>>>> consistency-by-writes by setting w=all. Therefore, we only require one
>>>>>>>>>>>>>> successful read (r=1).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When all nodes are up, everything is fine. If one node fails,
>>>>>>>>>>>>>> the clients can no longer read any keys at all. There's an exception like
>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> com.basho.riak.client.RiakRetryFailedException:
>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now, it isn't possible that Riak can't operate when one node
>>>>>>>>>>>>>> fails, so we're clearly missing something here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Note: we did try putting all 4 Riak nodes behind a
>>>>>>>>>>>>>> load-balancer and pointing the clients at it, but it didn't help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Riak is a high-availability key-value store, so... why are we
>>>>>>>>>>>>>> failing to achieve high-availability? Any suggestions greatly appreciated,
>>>>>>>>>>>>>> and if more info is required I'll do my best to provide it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>> Vanessa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Vanessa Williams
>>>>>>>>>>>>>> ThoughtWire Corporation
>>>>>>>>>>>>>> http://www.thoughtwire.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> riak-users mailing list
>>>>>>>>>>>>>> riak-users at lists.basho.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160222/1167cf1f/attachment-0002.html>


More information about the riak-users mailing list