The best (fastest) way to delete/clear a bucket [python]

Paweł Królikowski rabbbit at gmail.com
Mon May 19 16:33:08 EDT 2014


The problem is that the tombstones never disappear - they keep coming back
through bucket.get_keys() hours after deletion, even after a restart.

I said I'm using the delete_mode default configuration, because I didn't
change it. I now tried, and apparently it's not supported any more in Riak
2.0.

17:16:56.318 [error] You've tried to set delete_mode, but there is no
setting with that name.^M
17:16:56.318 [error]   Did you mean one of these?^M
17:16:56.335 [error]     dtrace^M
17:16:56.335 [error]     nodename^M
17:16:56.335 [error]     ssl.keyfile^M
17:16:56.335 [error] Error generating configuration in phase
transform_datatypes^M
17:16:56.335 [error] Conf file attempted to set unknown variable:
delete_mode^M
Error generating config with cuttlefish

I'm using Riak 2.0.0pre20, on strongly consistent buckets, on a single node
cluster. Can this be the reason? I guess what I need is a confirmation that
something is broken/that I'm doing something stupid.

I've tried looking for similar issues (github.com/basho/riak/issues),
didn't find any -> I guess that suggests I'm doing something stupid, I just
don't know what yet.


Thanks again :)

--
Paweł


On 19 May 2014 18:00, Dmitri Zagidulin <dzagidulin at basho.com> wrote:

> Ah, yes, you bring up a good point. (And, that's another subtlety to keep
> in mind, with Option #1).
>
> Tombstones are definitely something to keep in mind, when deleting unit
> test data.
> As you mentioned in your earlier question, if you're using default
> delete_mode configuration ( 3 seconds ), it means that if you issue a
> delete, a tombstone object is going to be written (and stick around for at
> least 3 seconds), and unfortunately, it is going to show up as a false
> positive on a List Keys call.
>
> The easiest thing to try, in your case, is to set 'delete_mode' to
> 'immediate', restart the test cluster, and retest. With an immediate
> delete, your second test with 10 keys should not take as long as the
> previous delete with 10000 keys.
>
>
>
>
> On Mon, May 19, 2014 at 11:46 AM, Paweł Królikowski <rabbbit at gmail.com>wrote:
>
>> Hi Dmitri,
>>
>> Thanks a lot for the answer. Option #1 seems the best, but I have a
>> follow up question:
>>
>> - when do the deleted keys disappear from Riak: a part of my problem
>> (have not explained it correctly the first time), is that get_keys()
>> returns keys that no longer exist. So, I run a test with 10 000 keys, I
>> remove them, it takes Nseconds. I then follow with a test with 10 keys, but
>> removing them takes just as much time - I imagine it's because I'm going
>> over that 10 000 keys again.
>>
>> This article seems relevant:
>> http://basho.com/riaks-config-behaviors-part-3/ - it seems like the
>> tombstones simply remain in my system indefinitely.
>>
>> --
>> Paweł
>>
>>
>> On 19 May 2014 15:32, Dmitri Zagidulin <dzagidulin at basho.com> wrote:
>>
>>> Hi Pawel,
>>>
>>> There's basically three ways to clear data from Riak (for the purposes
>>> of automated testing):
>>>
>>> 1. Iterate through the keys via get_keys(), and delete each one. This is
>>> what you're currently doing, except you don't need to invoke if.exists().
>>> if.exists() makes an additional API call to Riak, and it takes twice as
>>> long as just calling delete() (and trapping a potential 404 doesn't exist
>>> error).
>>>
>>> Advantages: Easy to understand, can be done entirely in code (without
>>> invoking OS/shell commands).
>>>
>>> Disadvantages: It can get slow, for large data sets. Another subtle
>>> disadvantage is that, as your app grows, it can get difficult to keep track
>>> of which buckets you've created and need to be cleared.
>>>
>>> 2. Stop the Riak cluster, delete the riak data directory, and re-start.
>>>
>>> Advantages: Very fast, and you can be sure that you're deleting all
>>> buckets.
>>>
>>> Disadvantages: Involves invoking OS/shell commands. This is fairly easy
>>> if your Riak node is running on the same machine as your tests (and if it's
>>> a single node). To delete the data directories of a multi-node cluster, now
>>> you need to involve either a bash script that uses SSH to log in and
>>> restart, or a coordination framework like Ansible.
>>>
>>> 3. Use an in-memory back end. (And to drop all data, just restart the
>>> node(s)).
>>>
>>> Advantages: Same as #2 - fast, thorough.
>>>
>>> Disadvantages: Same as #2 (involves shell commands, potentially SSH
>>> etc). In addition, since you're likely not going to be running your
>>> production code on an in-memory back end, this method introduces a
>>> potential environmental/functional difference between your testing and
>>> production clusters.
>>>
>>> I generally use method #1 in my unit tests, and manually delete each
>>> key.
>>>
>>> Dmitri
>>>
>>>
>>>
>>> On Mon, May 19, 2014 at 8:53 AM, Paweł Królikowski <rabbbit at gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> For testing, I'd like to be able to throw a large number of data at
>>>> Riak (100k+ entries), check how it performed, change something in the
>>>> application, run the test again. I'd like to use the same data every time,
>>>> so, I'd like to clear the bucket between every test.
>>>>
>>>> The documentation (
>>>> http://docs.basho.com/riak/2.0.0beta1/dev/references/http/) says:
>>>>
>>>> *Delete Buckets*
>>>> There is no straightforward way to delete an entire Bucket. To delete
>>>> all the keys in a bucket, you’ll need to delete them all individually.
>>>>
>>>>
>>>> So, I'm currently using something like:
>>>>
>>>> for k in r_bk.get_keys():
>>>> v = r_bk.get(k)
>>>>  if v.exists:
>>>> r_bk.delete(v)
>>>>
>>>> The problem is that r_bk.get_keys() returns a lot of elements that
>>>> don't exist (tombstones?) and iterating over all of them takes time.
>>>>
>>>> Is that the way it's supposed to work? Or am I missing something?
>>>>
>>>> - I'm using default delete_mode configuration ( 3 seconds )
>>>> - I'm using Riak 2.0 alpha 19 with Python. ( there's a bug with strong
>>>> consistency in Beta1, cannot use it)
>>>> - changing the bucket name for every run seems .. impractical?
>>>>
>>>> Any advices welcomed,
>>>>
>>>> --
>>>> Thanks,
>>>> Paweł
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140519/7a68358f/attachment.html>


More information about the riak-users mailing list