Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

Steve Warren swarren at myvest.com
Thu May 24 11:43:17 EDT 2012

Thanks Reid, that's a very clear explanation. Are there any logs created
under the cited circumstances? I'm not seeing any errors or logs that
indicate any of the below conditions are in fact happening and would like
to confirm the exact condition. I'm also not clear why doing this "PR = PW
= R = W = all" would eliminate the issue if it is about the reap failing.
Isn't reaping always independent of the success of the operation? In other
words, isn't creating a successful tombstone the definition of a successful
delete? Or does using "all" bypass the tombstone process altogether?

I will test out this workaround, it is probably worth doing to ensure the
indices are always correct rather then pepper the code to handle not found
conditions. Although it's still early in my evaluation so I may find a
reason to pepper the code anyway. :)

Finally, is there a process to create an issue that will track the eventual
resolution? If I do the workaround I'd like to take it out when it is
finally resolved (if there is a final resolution).


On Thu, May 24, 2012 at 8:27 AM, Reid Draper <reiddraper at gmail.com> wrote:

> I have a pretty good idea what is causing this problem.
> Riak uses "tombstone" values to denote that an object has been deleted.
> Under normal conditions, this tombstone value (really, the key/value pair)
> will be deleted three (3) seconds after the delete. The delete_mode config
> lets you change the time from three seconds, or to put it at immediate or
> keep.
> Regardless of the value of delete_mode, the key will continue to show
> up in list-keys and 2i $key queries as long as the tombstone is still
> around. This is because those calls simply iterate through the keys,
> and don't inspect the values for tombstones (this could potentially
> by quite costly, depending on the backend).
> For a more detailed explanation of deletes in Riak, I highly
> suggest you read Jon Meredith's ML post [1].
> Since in both of these cases, you have delete_mode set to either
> 3s or immediate, we are seeing a case where the first time the tombstone
> is attempted to be reaped, it fails. The tombstone reaping isn't attempted
> again until you do a GET on the object, which is why you see it no longer
> appear in 2i and list-keys queries afterward. This is because a
> read-repair-like
> mechanism runs and sees that the tombstone needs to be reaped.
> So why is the tombstone reap failing the first time? There could be several
> reasons. It's important to know, first, that the reaping process requires
> all
> N _primary_ replicas to be up and responding. Here are some potential
> reasons:
> 1. One of the primaries is temporarily unreachable.
> 2. The original tombstone writes didn't go to all N primaries, for any
> reason
> 3. The async GET after that starts the tombstone reaping times out, for
> any reason
> I don't have a silver-bullet recommendation for this problem at the moment.
> If you'd like to favor having deleted keys _not_ show up in 2i/list-keys
> requests over delete-availability, you can make your delete requests
> with PR = PW = R = W = all (note 'all' is equivalent to whatever your
> bucket's
> N value is).
> We'll also be exploring how we can fix or mitigate this situation for
> an upcoming release.
> [1]:
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-October/006048.html
> Thanks,
> Reid Draper
> Software Engineer
> Basho
> On May 23, 2012, at 4:42 PM, Steve Warren wrote:
> I have a 5 node cluster and given a successful delete call, I expect to
> get the latest data back given the bucket properties (as shown below)...
> Bucket properties:
> {"props":{"name":"mybucket","allow_mult":false,"basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":4,"notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":"quorum","precommit":[],"pw":"quorum","r":"quorum","rw":"quorum","small_vclock":50,"w":"quorum","young_vclock":20}}
> Is my understanding not correct in this (the important properties to me
> are the pw/pr settings to ensure a good distribution and consistency).
> Regards
> Steve
> On Wed, May 23, 2012 at 1:31 PM, Shuhao Wu <admin at thekks.net> wrote:
>> Riak is eventually consistent. Deleting it doesn't show up immediately.
>> There is an option like delete_immediate
>> Shuhao
>> On May 23, 2012 4:08 PM, "Steve Warren" <swarren at myvest.com> wrote:
>>> I'm seeing this pretty consistently and have no explanation for it. I
>>> delete a large number of keys (20k to 100k), but when I then search on the
>>> keys ($key/0/g) anywhere from 0-200 or so of the deleted keys show up in
>>> the results. It doesn't matter how long I wait after completing the
>>> deletion step, the keys stay in the list until I try to access the object
>>> and then it goes away. I'm using 1.1.2 and the riak-java client, and
>>> getting no errors on the deletion step.
>>> On Tue, May 22, 2012 at 9:34 AM, Steve Warren <swarren at myvest.com>wrote:
>>>> Thank you for the reply. My observation does not quite match up with
>>>> this though so I'm still a bit confused. The deleted keys appeared to stay
>>>> long past the 3 seconds described in the post you referenced. In fact, I
>>>> don't know if they ever "went away". I'll run some more tests to see if I
>>>> can narrow down the exact behavior, for example not all key deletions
>>>> exhibited this behavior (the test I ran resulted in 118 residual keys out
>>>> of around 20K deletes). If I directly queried any of the keys it would
>>>> respond with "not found" and immediately stop showing up in the key list or
>>>> $key index query.
>>>> I'm still running a bunch of tests just to learn the behavior of the
>>>> system so I'll keep plugging away at it. For example, I'm observing that
>>>> $key index queries halt inserts into the same bucket while the query is
>>>> running, I don't know yet if this halts all server activity or just the
>>>> inserts for that bucket though.
>>>> Regards
>>>> Steve
>>>> On Tue, May 22, 2012 at 8:58 AM, Kelly McLaughlin <kelly at basho.com>wrote:
>>>>> Hi Steve. There is no caching of key lists in riak. What you are
>>>>> seeing is likely the fact that listing of keys or index queries can pick up
>>>>> deleted keys due to the fact that riak keeps tombstone markers around for
>>>>> deleted objects for some period. For a really good explanation of riak's
>>>>> delete behavior check out this writeup by Jon Meredith:
>>>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-October/006048.html.
>>>>> You can set delete_mode to immediate as described in that post and you will
>>>>> most likely not see any deleted keys when you do an index query or key
>>>>> list. The tradeoff is that you may get the unexpected behavior when doing
>>>>> concurrent updates to the same set of keys that the delete_mode changes
>>>>> were designed to address as Jon also indicates in that post.  We are
>>>>> considering different options on this front, but at this time no actual
>>>>> changes have been made to address this.
>>>>> Kelly
>>>>> On May 20, 2012, at 10:13 AM, Steve Warren wrote:
>>>>> The last message I saw on this (from a year ago) says the caching of
>>>>> key lists will be removed. I just ran into it while running a $key index
>>>>> range search. I then ran a ?key=stream search on the bucket and the same
>>>>> stale key list appeared (I had created a bunch of data and then deleted it
>>>>> as a test). Did the caching removal not happen? I'm running 1.1.2
>>>>> The query:
>>>>> curl 'localhost:8098/buckets/testbucket/index/$key/0/g'
>>>>> As others have noted, this behavior is quite disconcerting and I don't
>>>>> want to pepper the application with otherwise unnecessary checks for stale
>>>>> keys even on 2i range queries. Or is that unavoidable?
>>>>> Regards
>>>>> Steve
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120524/8a83e042/attachment.html>

More information about the riak-users mailing list