Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

Reid Draper reiddraper at gmail.com
Thu May 24 11:27:41 EDT 2012


I have a pretty good idea what is causing this problem.

Riak uses "tombstone" values to denote that an object has been deleted.
Under normal conditions, this tombstone value (really, the key/value pair)
will be deleted three (3) seconds after the delete. The delete_mode config
lets you change the time from three seconds, or to put it at immediate or
keep.

Regardless of the value of delete_mode, the key will continue to show
up in list-keys and 2i $key queries as long as the tombstone is still
around. This is because those calls simply iterate through the keys,
and don't inspect the values for tombstones (this could potentially
by quite costly, depending on the backend). 

For a more detailed explanation of deletes in Riak, I highly
suggest you read Jon Meredith's ML post [1].

Since in both of these cases, you have delete_mode set to either
3s or immediate, we are seeing a case where the first time the tombstone
is attempted to be reaped, it fails. The tombstone reaping isn't attempted
again until you do a GET on the object, which is why you see it no longer
appear in 2i and list-keys queries afterward. This is because a read-repair-like
mechanism runs and sees that the tombstone needs to be reaped.

So why is the tombstone reap failing the first time? There could be several
reasons. It's important to know, first, that the reaping process requires all
N _primary_ replicas to be up and responding. Here are some potential
reasons:

1. One of the primaries is temporarily unreachable.
2. The original tombstone writes didn't go to all N primaries, for any reason
3. The async GET after that starts the tombstone reaping times out, for any reason

I don't have a silver-bullet recommendation for this problem at the moment.
If you'd like to favor having deleted keys _not_ show up in 2i/list-keys
requests over delete-availability, you can make your delete requests
with PR = PW = R = W = all (note 'all' is equivalent to whatever your bucket's
N value is).

We'll also be exploring how we can fix or mitigate this situation for
an upcoming release.

[1]: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-October/006048.html

Thanks,
Reid Draper
Software Engineer
Basho

On May 23, 2012, at 4:42 PM, Steve Warren wrote:

> I have a 5 node cluster and given a successful delete call, I expect to get the latest data back given the bucket properties (as shown below)...
> 
> Bucket properties:
> 
> {"props":{"name":"mybucket","allow_mult":false,"basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":4,"notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":"quorum","precommit":[],"pw":"quorum","r":"quorum","rw":"quorum","small_vclock":50,"w":"quorum","young_vclock":20}}
> 
> Is my understanding not correct in this (the important properties to me are the pw/pr settings to ensure a good distribution and consistency).
> 
> Regards
> Steve
> 
> On Wed, May 23, 2012 at 1:31 PM, Shuhao Wu <admin at thekks.net> wrote:
> Riak is eventually consistent. Deleting it doesn't show up immediately. There is an option like delete_immediate
> 
> Shuhao
> 
> On May 23, 2012 4:08 PM, "Steve Warren" <swarren at myvest.com> wrote:
> I'm seeing this pretty consistently and have no explanation for it. I delete a large number of keys (20k to 100k), but when I then search on the keys ($key/0/g) anywhere from 0-200 or so of the deleted keys show up in the results. It doesn't matter how long I wait after completing the deletion step, the keys stay in the list until I try to access the object and then it goes away. I'm using 1.1.2 and the riak-java client, and getting no errors on the deletion step.
> 
> On Tue, May 22, 2012 at 9:34 AM, Steve Warren <swarren at myvest.com> wrote:
> Thank you for the reply. My observation does not quite match up with this though so I'm still a bit confused. The deleted keys appeared to stay long past the 3 seconds described in the post you referenced. In fact, I don't know if they ever "went away". I'll run some more tests to see if I can narrow down the exact behavior, for example not all key deletions exhibited this behavior (the test I ran resulted in 118 residual keys out of around 20K deletes). If I directly queried any of the keys it would respond with "not found" and immediately stop showing up in the key list or $key index query.
> 
> I'm still running a bunch of tests just to learn the behavior of the system so I'll keep plugging away at it. For example, I'm observing that $key index queries halt inserts into the same bucket while the query is running, I don't know yet if this halts all server activity or just the inserts for that bucket though.
> 
> Regards
> Steve
> 
> On Tue, May 22, 2012 at 8:58 AM, Kelly McLaughlin <kelly at basho.com> wrote:
> Hi Steve. There is no caching of key lists in riak. What you are seeing is likely the fact that listing of keys or index queries can pick up deleted keys due to the fact that riak keeps tombstone markers around for deleted objects for some period. For a really good explanation of riak's delete behavior check out this writeup by Jon Meredith: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-October/006048.html. You can set delete_mode to immediate as described in that post and you will most likely not see any deleted keys when you do an index query or key list. The tradeoff is that you may get the unexpected behavior when doing concurrent updates to the same set of keys that the delete_mode changes were designed to address as Jon also indicates in that post.  We are considering different options on this front, but at this time no actual changes have been made to address this. 
> 
> Kelly
> 
> On May 20, 2012, at 10:13 AM, Steve Warren wrote:
> 
>> The last message I saw on this (from a year ago) says the caching of key lists will be removed. I just ran into it while running a $key index range search. I then ran a ?key=stream search on the bucket and the same stale key list appeared (I had created a bunch of data and then deleted it as a test). Did the caching removal not happen? I'm running 1.1.2
>> 
>> The query:
>> 
>> curl 'localhost:8098/buckets/testbucket/index/$key/0/g'
>> 
>> As others have noted, this behavior is quite disconcerting and I don't want to pepper the application with otherwise unnecessary checks for stale keys even on 2i range queries. Or is that unavoidable?
>> 
>> Regards
>> Steve
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120524/8d2de57f/attachment.html>


More information about the riak-users mailing list