Quickly deleting + recreating item in Riak deletes new item

Matthew Dawson matthew at mjdsystems.ca
Sun Jul 21 02:28:48 EDT 2013

On July 19, 2013 02:11:10 PM Kelly McLaughlin wrote:
> Matthew,
> I did some testing with your code and I was able to reproduce what you were
> seeing. I would occasionally see an error similar to the following:
>     Failed to fetch item 460 err Object not found
> This behavior is a result of the trade-offs of using an eventually
> consistent database like Riak. It is not the case that your inserts are
> failing to write or the data is being lost, but what is actually happening
> is that the quick read after writing with the default request options does
> not provide any guarantee that you will read your writes. So basically when
> you make the read, the replicas that are responding to your request have
> not seen the latest value yet and so you end up with "Not Found" as the
> response. If you did another read attempt for one of those objects reported
> missing, it would succeed because Riak's read-repair would have kicked in
> to make sure each replica has the value. To increase the likelihood of
> reading your writes you should set the optional request parameters pr and
> pw to ensure that all of the primary replicas are available prior to
> performing a read or write request. I altered your code to use those
> options and put the updates in a gist [1] (it's my first stab at Go so my
> changes may not be very idiomatic). Additionally I changed to riak driver
> so that NotFoundOk was false instead of true. With these changes I was able
> to run the test 50 times in a row with no errors where previously I would
> see at least one error every 10 iterations or so. Hope that helps.

Thanks for looking into this.  I looked over the documentation and your 
suggestions, and I think I have found a race causing my data loss.  

I read what you wrote above, but I don't think this applies in this situation.  
If I stop the program after it finds a missing element, but before it deletes 
all the keys, I've found that any key it finds missing is also missing if I 
look at it using Curl.  Also, I've looked at the returned headers, and both 
"X-Riak-Deleted: true" and a vclock are returned, which suggests to me Riak 
has actually deleted the data on purpose.  I've included sample output at [2].  
(I used a newer version of my program in a gist[1]).  The updated program has 
an extra flag, --quit_early=false , that exits before the deletion if a 
problem was found during the fetch stage.

Since your version did fix my issue as well, I played around with the various 
settings.  I discovered to fix the issue, I only have to use an r value of 3 
when first fetching the item to get a vclock for the insert operation.  As 
long as that value is a 3, everything seems to work perfectly.  Adjusting the 
other values (I didn't test pr, but I left all the cluster nodes running, so I 
don't think it affects anything) produced zero change (including when w/rw/dw 
were all set to 3, or r was set to 3 elsewhere).  You can try this using the 
updated program[1] by passing "-r=3" to it.

Using the above knowledge, I looked at whether vclocks were being returned for 
the insert operation when keys went missing during the fetch stage.  In all 
cases, vclocks were missing.  I also checked to see if there are any siblings 
being generated (which I didn't think would be created), and some were 
generated as well.  Looking at them, they contain one sibling containing the 
key's deletion, and then my data (example in [2]).

>From those sets of facts, I'm guessing there is some race in Riak where either 
the deletion or the creation gets committed first.  If the creation is 
committed first, for some reason the deletion also deletes it.  I'm guessing 
that it happens since the vclocks show the creation happening before the 
deletion.  For some reason, when r=3 during the insert stage's fetch causes 
Riak to commit the deletion cluster wide and the commit always appears to come 

I think this is a bug, since the deletion shouldn't see the insert operation 
as history.  Since Riak makes this mistakes, it is actually losing data.  I 
don't think the siblings should be created created either (in the r=3 case, 
they are not created), but that is easily handled.  If this is a bug, I'll 
file it on Github (is the riak_kv project the correct place in this 
situation?), or if this is expected behaviour I do have a workaround.

A few other data points: I retried this using delete_mode = keep, and it fixed 
this issue as well.  Using delete_mode=immediate made the issue more rare.  I 
also find it happens almost everytime if you run it quickly, but if you give 
Riak sometime to synchronize state, it does reduce the chance of seeing it.

[1] https://gist.github.com/MJDSys/6026486
[2] https://gist.github.com/MJDSys/420090505ebbe2e0241d
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3841 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130721/db66394f/attachment.p7s>

More information about the riak-users mailing list