The best (fastest) way to delete/clear a bucket [python]

Paweł Królikowski rabbbit at
Mon May 19 16:33:08 EDT 2014

The problem is that the tombstones never disappear - they keep coming back
through bucket.get_keys() hours after deletion, even after a restart.

I said I'm using the delete_mode default configuration, because I didn't
change it. I now tried, and apparently it's not supported any more in Riak

17:16:56.318 [error] You've tried to set delete_mode, but there is no
setting with that name.^M
17:16:56.318 [error]   Did you mean one of these?^M
17:16:56.335 [error]     dtrace^M
17:16:56.335 [error]     nodename^M
17:16:56.335 [error]     ssl.keyfile^M
17:16:56.335 [error] Error generating configuration in phase
17:16:56.335 [error] Conf file attempted to set unknown variable:
Error generating config with cuttlefish

I'm using Riak 2.0.0pre20, on strongly consistent buckets, on a single node
cluster. Can this be the reason? I guess what I need is a confirmation that
something is broken/that I'm doing something stupid.

I've tried looking for similar issues (,
didn't find any -> I guess that suggests I'm doing something stupid, I just
don't know what yet.

Thanks again :)


On 19 May 2014 18:00, Dmitri Zagidulin <dzagidulin at> wrote:

> Ah, yes, you bring up a good point. (And, that's another subtlety to keep
> in mind, with Option #1).
> Tombstones are definitely something to keep in mind, when deleting unit
> test data.
> As you mentioned in your earlier question, if you're using default
> delete_mode configuration ( 3 seconds ), it means that if you issue a
> delete, a tombstone object is going to be written (and stick around for at
> least 3 seconds), and unfortunately, it is going to show up as a false
> positive on a List Keys call.
> The easiest thing to try, in your case, is to set 'delete_mode' to
> 'immediate', restart the test cluster, and retest. With an immediate
> delete, your second test with 10 keys should not take as long as the
> previous delete with 10000 keys.
> On Mon, May 19, 2014 at 11:46 AM, Paweł Królikowski <rabbbit at>wrote:
>> Hi Dmitri,
>> Thanks a lot for the answer. Option #1 seems the best, but I have a
>> follow up question:
>> - when do the deleted keys disappear from Riak: a part of my problem
>> (have not explained it correctly the first time), is that get_keys()
>> returns keys that no longer exist. So, I run a test with 10 000 keys, I
>> remove them, it takes Nseconds. I then follow with a test with 10 keys, but
>> removing them takes just as much time - I imagine it's because I'm going
>> over that 10 000 keys again.
>> This article seems relevant:
>> - it seems like the
>> tombstones simply remain in my system indefinitely.
>> --
>> Paweł
>> On 19 May 2014 15:32, Dmitri Zagidulin <dzagidulin at> wrote:
>>> Hi Pawel,
>>> There's basically three ways to clear data from Riak (for the purposes
>>> of automated testing):
>>> 1. Iterate through the keys via get_keys(), and delete each one. This is
>>> what you're currently doing, except you don't need to invoke if.exists().
>>> if.exists() makes an additional API call to Riak, and it takes twice as
>>> long as just calling delete() (and trapping a potential 404 doesn't exist
>>> error).
>>> Advantages: Easy to understand, can be done entirely in code (without
>>> invoking OS/shell commands).
>>> Disadvantages: It can get slow, for large data sets. Another subtle
>>> disadvantage is that, as your app grows, it can get difficult to keep track
>>> of which buckets you've created and need to be cleared.
>>> 2. Stop the Riak cluster, delete the riak data directory, and re-start.
>>> Advantages: Very fast, and you can be sure that you're deleting all
>>> buckets.
>>> Disadvantages: Involves invoking OS/shell commands. This is fairly easy
>>> if your Riak node is running on the same machine as your tests (and if it's
>>> a single node). To delete the data directories of a multi-node cluster, now
>>> you need to involve either a bash script that uses SSH to log in and
>>> restart, or a coordination framework like Ansible.
>>> 3. Use an in-memory back end. (And to drop all data, just restart the
>>> node(s)).
>>> Advantages: Same as #2 - fast, thorough.
>>> Disadvantages: Same as #2 (involves shell commands, potentially SSH
>>> etc). In addition, since you're likely not going to be running your
>>> production code on an in-memory back end, this method introduces a
>>> potential environmental/functional difference between your testing and
>>> production clusters.
>>> I generally use method #1 in my unit tests, and manually delete each
>>> key.
>>> Dmitri
>>> On Mon, May 19, 2014 at 8:53 AM, Paweł Królikowski <rabbbit at>wrote:
>>>> Hi,
>>>> For testing, I'd like to be able to throw a large number of data at
>>>> Riak (100k+ entries), check how it performed, change something in the
>>>> application, run the test again. I'd like to use the same data every time,
>>>> so, I'd like to clear the bucket between every test.
>>>> The documentation (
>>>> says:
>>>> *Delete Buckets*
>>>> There is no straightforward way to delete an entire Bucket. To delete
>>>> all the keys in a bucket, you’ll need to delete them all individually.
>>>> So, I'm currently using something like:
>>>> for k in r_bk.get_keys():
>>>> v = r_bk.get(k)
>>>>  if v.exists:
>>>> r_bk.delete(v)
>>>> The problem is that r_bk.get_keys() returns a lot of elements that
>>>> don't exist (tombstones?) and iterating over all of them takes time.
>>>> Is that the way it's supposed to work? Or am I missing something?
>>>> - I'm using default delete_mode configuration ( 3 seconds )
>>>> - I'm using Riak 2.0 alpha 19 with Python. ( there's a bug with strong
>>>> consistency in Beta1, cannot use it)
>>>> - changing the bucket name for every run seems .. impractical?
>>>> Any advices welcomed,
>>>> --
>>>> Thanks,
>>>> Paweł
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list