Deleted keys come back

Dmitri Zagidulin dzagidulin at
Wed Oct 7 09:19:43 EDT 2015


There are two things going on here: the W quorum value of the write and
delete operations, and possibly the delete_mode setting.

Let's walk through the scenario.
You're writing to a 2 node cluster, two copies of each object (n_val=2),
with your write quorum of 1 (W=1).

So that's possibility #1 -- there's no guarantee that your writes succeed
with both replicas. (It could've just written one, and the other one is

Then you're doing a List Keys (to delete the objects), which runs with an
implicit quorum of R=1. (Meaning, it only contacts half of the replicas,
and lists them.) So (if possibility #1 happened, above) the list keys could
have not returned some keys, the first time around. (Because it may have
contacted the partitions that had missing replicas).  Then you deleted, ran
another List Keys, and that one could have returned the keys that it missed
the first time.

Possibility #2 -- your deletes are using W=1, meaning, they're only waiting
for the delete operation from 1 replica to respond, before returning
success. So, it's possible that a delete operation removed just one
replica, but the second one still exists. And the second List Keys can now
pick up the not-deleted replica.

Possibility #3 -- by default, the delete_mode is set to keep deleted
objects for 3 seconds.  So, if you ran your deletes, and then re-ran a List
Keys before the 3 seconds expired, you could pick up some keys.

The upshot of all this is:
- Use W=2 when writing and deleting. (That is, your W value should be the
same as your N value).

- If that doesn't work, set delete_mode to 'immediate' in the config.
Specifically, delete_mode is set in the advanced.config file (
). So, my advanced.config file looks like this:

      {delete_mode, immediate}

Also, if you're deleting things for unit tests, there's an easier way.
Instead of deleting the bucket object-by-object, you can just stop the
node, and clear the bitcask (or leveldb) data directory. (That's going to
get rid of all the data in the cluster, which is what you want to do for
unit tests anyways.)

You can learn more about these topics on the following pages:
(mailing list post introducing delete_mode)
*  (the
Tombstones section)


On Wed, Oct 7, 2015 at 1:35 PM, mtakahashi-ivi <mtakahashi at>

> Hello,
> I'm using riak KV in 2 nodes cluster.
> I inserted hundreds of key/value pair and deleted all keys in a bucket.
> After above process, I can get some keys if I get list of keys in the
> bucket.
> Why those keys remain? How do I delete keys reliably?
> If I increase number of nodes to 5 , I can delete all keys in the bucket as
> same way as I did.
> My bucket property is the following.
> ----------------------
> {
>   "props": {
>     "name": "BUCKET_A",
>     "active": true,
>     "allow_mult": false,
>     "basic_quorum": false,
>     "big_vclock": 50,
>     "chash_keyfun": {
>       "mod": "riak_core_util",
>       "fun": "chash_std_keyfun"
>     },
>     "claimant": "riak at",
>     "dvv_enabled": true,
>     "dw": "quorum",
>     "last_write_wins": false,
>     "linkfun": {
>       "mod": "riak_kv_wm_link_walker",
>       "fun": "mapreduce_linkfun"
>     },
>     "n_val": 2,
>     "notfound_ok": false,
>     "old_vclock": 86400,
>     "postcommit": [],
>     "pr": 0,
>     "precommit": [],
>     "pw": 0,
>     "r": 1,
>     "rw": "quorum",
>     "search_index": "BUCKET_A_INDEX",
>     "small_vclock": 50,
>     "w": 1,
>     "young_vclock": 20
>   }
> }
> Masanori Takahashi
> --
> View this message in context:
> Sent from the Riak Users mailing list archive at
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list