The best (fastest) way to delete/clear a bucket [python]

Sean Cribbs sean at basho.com
Tue May 20 10:50:45 EDT 2014


For what it's worth, in the integration tests of our client libraries we
have moved to generating random bucket and key names for each test/example.
This reduces setup/teardown time and is less susceptible to the types of
unexpected behaviors you are seeing from list-keys. If possible, I highly
recommend this approach in your suite.


On Tue, May 20, 2014 at 9:25 AM, Dmitri Zagidulin <dzagidulin at basho.com>wrote:

> Ok, so, from what I understand, this is going to be expected behavior from
> strongly consistent buckets. (I'm in the process of confirming this, and
> we'll see if we can add it to the documentation). The delete_mode:
> immediate is ignored, and the tombstone is kept around, to ensure the
> consistency of not found, etc. (In the context of further over-writes of
> that key).
>
> So, unfortunately that may be bad news in terms of deleting a
> stongly_consistent bucket via keylist for unit testing. :)
>
> You may want to switch to method #2, for your test suite. (Write a shell
> script to stop the node, delete the bitcask & aae dirs, and restart. And
> invoke it as a shell script command from your test suite. Or just call
> those commands directly.).
>
>
>
> On Tue, May 20, 2014 at 5:44 AM, Paweł Królikowski <rabbbit at gmail.com>wrote:
>
>> Ok then,
>>
>> I've stopped riak, wiped bitcask and anti_entropy directories, updated
>> config, started riak.
>>
>> I've tried to verify it with:
>>
>> riak config generate -l debug
>>
>> Got output:
>>
>> [...]
>>
>> 10:25:46.260 [info] /etc/riak/advanced.config detected, overlaying
>> proplists
>>  -config /var/lib/riak/generated.configs/app.2014.05.20.10.25.46.config
>> -args_file /var/lib/riak/generated.configs/vm.2014.05.20.10.25.46.args
>> -vm_args /var/lib/riak/generated.configs/vm.2014.05.20.10.25.46.args
>>
>>
>> And at the very end of the config file there's:
>>
>>  {k_kv,[{delete_mode,immediate}]}].
>>
>> So, it worked.
>>
>>
>>  Then did this:
>>
>> >>> import riak
>> >>> c = riak.RiakClient(pb_port=8087, protocol='pbc', host='db-13')
>> >>> b = c.bucket(name='locate', bucket_type='strongly_consistent')
>> >>> o = b.get('foo')
>> >>> o.data = 3
>> >>> o.store()
>> <riak.riak_object.RiakObject object at 0x2b2ce90>
>> >>> o.delete()
>> <riak.riak_object.RiakObject object at 0x2b2ce90>
>> >>> b.delete('foo')
>> <riak.riak_object.RiakObject object at 0x2b55d90>
>> >>> o.exists
>> False
>> >>> b.get_keys()
>> ['foo']
>>
>>
>> So, it didn't work.
>>
>> It's not just the python client, because if I do this, I get the key back:
>>
>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys?keys=true
>> {"keys":["foo"]}
>>
>>
>>
>> I've tried deleting the key via http request (curl -v -X DELETE
>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys/bar),
>> but it still remains.
>>
>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys/foo
>>
>> returns
>>
>> not found
>>
>> but
>>
>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys?keys=true
>>
>> gives
>>
>> {"keys":["foo","bar"]}
>>
>>
>> I've tried looking for detailed logs, but console.log, even on debug,
>> doesn't print anything useful.
>> I've also tried looking inside bitcask directory, and there's definitely
>> 'some' binary data there, even after deletion.
>>
>>
>> On 19 May 2014 23:23, Dmitri Zagidulin <dzagidulin at basho.com> wrote:
>>
>>> Ah, that's interesting, let's see if we can test this.
>>>
>>> The 'delete_mode' configuration is not supported in the regular
>>> riak.conf file, from what I understand.
>>> However, you can still set it in the 'advanced.config' file, as
>>> described here:
>>>
>>> https://github.com/basho/basho_docs/blob/features/lp/advanced-conf/source/languages/en/riak/ops/advanced/configs/configuration-files.md#the-advancedconfig-file
>>> (those docs are a current work-in-progress, mind you)
>>>
>>> So, create an advanced.config file in your riak etc/ directory (this
>>> will be in addition to your existing riak.conf), with the following
>>> contents:
>>> [
>>>  {riak_kv, [
>>>    {delete_mode, immediate}
>>>  ]}
>>> ].
>>>
>>> Restart the node, and try your tests again. The tombstones should
>>> disappear now on every delete request. (You should probably also wipe all
>>> of the old data, by deleting the contents of the bitcask and anti_entropy
>>> directories in your riak data dir, just to make sure the old ones are gone.
>>> This should be done while the node is down, of course.)
>>>
>>>
>>>
>>> On Mon, May 19, 2014 at 4:33 PM, Paweł Królikowski <rabbbit at gmail.com>wrote:
>>>
>>>> The problem is that the tombstones never disappear - they keep coming
>>>> back through bucket.get_keys() hours after deletion, even after a restart.
>>>>
>>>> I said I'm using the delete_mode default configuration, because I
>>>> didn't change it. I now tried, and apparently it's not supported any more
>>>> in Riak 2.0.
>>>>
>>>> 17:16:56.318 [error] You've tried to set delete_mode, but there is no
>>>> setting with that name.^M
>>>> 17:16:56.318 [error]   Did you mean one of these?^M
>>>> 17:16:56.335 [error]     dtrace^M
>>>> 17:16:56.335 [error]     nodename^M
>>>> 17:16:56.335 [error]     ssl.keyfile^M
>>>> 17:16:56.335 [error] Error generating configuration in phase
>>>> transform_datatypes^M
>>>> 17:16:56.335 [error] Conf file attempted to set unknown variable:
>>>> delete_mode^M
>>>> Error generating config with cuttlefish
>>>>
>>>> I'm using Riak 2.0.0pre20, on strongly consistent buckets, on a single
>>>> node cluster. Can this be the reason? I guess what I need is a confirmation
>>>> that something is broken/that I'm doing something stupid.
>>>>
>>>> I've tried looking for similar issues (github.com/basho/riak/issues),
>>>> didn't find any -> I guess that suggests I'm doing something stupid, I just
>>>> don't know what yet.
>>>>
>>>>
>>>> Thanks again :)
>>>>
>>>> --
>>>> Paweł
>>>>
>>>>
>>>> On 19 May 2014 18:00, Dmitri Zagidulin <dzagidulin at basho.com> wrote:
>>>>
>>>>> Ah, yes, you bring up a good point. (And, that's another subtlety to
>>>>> keep in mind, with Option #1).
>>>>>
>>>>> Tombstones are definitely something to keep in mind, when deleting
>>>>> unit test data.
>>>>> As you mentioned in your earlier question, if you're using default
>>>>> delete_mode configuration ( 3 seconds ), it means that if you issue a
>>>>> delete, a tombstone object is going to be written (and stick around for at
>>>>> least 3 seconds), and unfortunately, it is going to show up as a false
>>>>> positive on a List Keys call.
>>>>>
>>>>> The easiest thing to try, in your case, is to set 'delete_mode' to
>>>>> 'immediate', restart the test cluster, and retest. With an immediate
>>>>> delete, your second test with 10 keys should not take as long as the
>>>>> previous delete with 10000 keys.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, May 19, 2014 at 11:46 AM, Paweł Królikowski <rabbbit at gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi Dmitri,
>>>>>>
>>>>>> Thanks a lot for the answer. Option #1 seems the best, but I have a
>>>>>> follow up question:
>>>>>>
>>>>>> - when do the deleted keys disappear from Riak: a part of my problem
>>>>>> (have not explained it correctly the first time), is that get_keys()
>>>>>> returns keys that no longer exist. So, I run a test with 10 000 keys, I
>>>>>> remove them, it takes Nseconds. I then follow with a test with 10 keys, but
>>>>>> removing them takes just as much time - I imagine it's because I'm going
>>>>>> over that 10 000 keys again.
>>>>>>
>>>>>> This article seems relevant:
>>>>>> http://basho.com/riaks-config-behaviors-part-3/ - it seems like the
>>>>>> tombstones simply remain in my system indefinitely.
>>>>>>
>>>>>> --
>>>>>> Paweł
>>>>>>
>>>>>>
>>>>>> On 19 May 2014 15:32, Dmitri Zagidulin <dzagidulin at basho.com> wrote:
>>>>>>
>>>>>>> Hi Pawel,
>>>>>>>
>>>>>>> There's basically three ways to clear data from Riak (for the
>>>>>>> purposes of automated testing):
>>>>>>>
>>>>>>> 1. Iterate through the keys via get_keys(), and delete each one.
>>>>>>> This is what you're currently doing, except you don't need to invoke
>>>>>>> if.exists().
>>>>>>> if.exists() makes an additional API call to Riak, and it takes twice
>>>>>>> as long as just calling delete() (and trapping a potential 404 doesn't
>>>>>>> exist error).
>>>>>>>
>>>>>>> Advantages: Easy to understand, can be done entirely in code
>>>>>>> (without invoking OS/shell commands).
>>>>>>>
>>>>>>> Disadvantages: It can get slow, for large data sets. Another subtle
>>>>>>> disadvantage is that, as your app grows, it can get difficult to keep track
>>>>>>> of which buckets you've created and need to be cleared.
>>>>>>>
>>>>>>> 2. Stop the Riak cluster, delete the riak data directory, and
>>>>>>> re-start.
>>>>>>>
>>>>>>> Advantages: Very fast, and you can be sure that you're deleting all
>>>>>>> buckets.
>>>>>>>
>>>>>>> Disadvantages: Involves invoking OS/shell commands. This is fairly
>>>>>>> easy if your Riak node is running on the same machine as your tests (and if
>>>>>>> it's a single node). To delete the data directories of a multi-node
>>>>>>> cluster, now you need to involve either a bash script that uses SSH to log
>>>>>>> in and restart, or a coordination framework like Ansible.
>>>>>>>
>>>>>>> 3. Use an in-memory back end. (And to drop all data, just restart
>>>>>>> the node(s)).
>>>>>>>
>>>>>>> Advantages: Same as #2 - fast, thorough.
>>>>>>>
>>>>>>> Disadvantages: Same as #2 (involves shell commands, potentially SSH
>>>>>>> etc). In addition, since you're likely not going to be running your
>>>>>>> production code on an in-memory back end, this method introduces a
>>>>>>> potential environmental/functional difference between your testing and
>>>>>>> production clusters.
>>>>>>>
>>>>>>> I generally use method #1 in my unit tests, and manually delete each
>>>>>>> key.
>>>>>>>
>>>>>>> Dmitri
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 19, 2014 at 8:53 AM, Paweł Królikowski <
>>>>>>> rabbbit at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> For testing, I'd like to be able to throw a large number of data at
>>>>>>>> Riak (100k+ entries), check how it performed, change something in the
>>>>>>>> application, run the test again. I'd like to use the same data every time,
>>>>>>>> so, I'd like to clear the bucket between every test.
>>>>>>>>
>>>>>>>> The documentation (
>>>>>>>> http://docs.basho.com/riak/2.0.0beta1/dev/references/http/) says:
>>>>>>>>
>>>>>>>> *Delete Buckets*
>>>>>>>> There is no straightforward way to delete an entire Bucket. To
>>>>>>>> delete all the keys in a bucket, you’ll need to delete them all
>>>>>>>> individually.
>>>>>>>>
>>>>>>>>
>>>>>>>> So, I'm currently using something like:
>>>>>>>>
>>>>>>>> for k in r_bk.get_keys():
>>>>>>>> v = r_bk.get(k)
>>>>>>>>  if v.exists:
>>>>>>>> r_bk.delete(v)
>>>>>>>>
>>>>>>>> The problem is that r_bk.get_keys() returns a lot of elements that
>>>>>>>> don't exist (tombstones?) and iterating over all of them takes time.
>>>>>>>>
>>>>>>>> Is that the way it's supposed to work? Or am I missing something?
>>>>>>>>
>>>>>>>> - I'm using default delete_mode configuration ( 3 seconds )
>>>>>>>> - I'm using Riak 2.0 alpha 19 with Python. ( there's a bug with
>>>>>>>> strong consistency in Beta1, cannot use it)
>>>>>>>> - changing the bucket name for every run seems .. impractical?
>>>>>>>>
>>>>>>>> Any advices welcomed,
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Paweł
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> riak-users mailing list
>>>>>>>> riak-users at lists.basho.com
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Sean Cribbs <sean at basho.com>
Software Engineer
Basho Technologies, Inc.
http://basho.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140520/702016cb/attachment.html>


More information about the riak-users mailing list