The best (fastest) way to delete/clear a bucket [python]

Paweł Królikowski rabbbit at gmail.com
Wed May 21 05:12:29 EDT 2014


@Dmitri - cool, thanks. Now that I know it's an expected behaviour, even if
I think it's strange, I can find a way of working around it :)

@Sean - tbh, I don't know. I was trying to test a whole application,
involving http requests + multiple consumers over rabbitmq with semi-real
data, so random bucket/key names sound .. wrong (&compliated?). On the
other hand, restarting riak & nuking data directory, possibly on mutli-node
cluster, doesn't seem that much better.

I'll play with tests a little longer, I'll come up with something that
works.

Anyway, thanks for the help :)


On 20 May 2014 15:50, Sean Cribbs <sean at basho.com> wrote:

> For what it's worth, in the integration tests of our client libraries we
> have moved to generating random bucket and key names for each test/example.
> This reduces setup/teardown time and is less susceptible to the types of
> unexpected behaviors you are seeing from list-keys. If possible, I highly
> recommend this approach in your suite.
>
>
> On Tue, May 20, 2014 at 9:25 AM, Dmitri Zagidulin <dzagidulin at basho.com>wrote:
>
>> Ok, so, from what I understand, this is going to be expected behavior
>> from strongly consistent buckets. (I'm in the process of confirming this,
>> and we'll see if we can add it to the documentation). The delete_mode:
>> immediate is ignored, and the tombstone is kept around, to ensure the
>> consistency of not found, etc. (In the context of further over-writes of
>> that key).
>>
>> So, unfortunately that may be bad news in terms of deleting a
>> stongly_consistent bucket via keylist for unit testing. :)
>>
>> You may want to switch to method #2, for your test suite. (Write a shell
>> script to stop the node, delete the bitcask & aae dirs, and restart. And
>> invoke it as a shell script command from your test suite. Or just call
>> those commands directly.).
>>
>>
>>
>> On Tue, May 20, 2014 at 5:44 AM, Paweł Królikowski <rabbbit at gmail.com>wrote:
>>
>>> Ok then,
>>>
>>> I've stopped riak, wiped bitcask and anti_entropy directories, updated
>>> config, started riak.
>>>
>>> I've tried to verify it with:
>>>
>>> riak config generate -l debug
>>>
>>> Got output:
>>>
>>> [...]
>>>
>>> 10:25:46.260 [info] /etc/riak/advanced.config detected, overlaying
>>> proplists
>>>  -config /var/lib/riak/generated.configs/app.2014.05.20.10.25.46.config
>>> -args_file /var/lib/riak/generated.configs/vm.2014.05.20.10.25.46.args
>>> -vm_args /var/lib/riak/generated.configs/vm.2014.05.20.10.25.46.args
>>>
>>>
>>> And at the very end of the config file there's:
>>>
>>>  {k_kv,[{delete_mode,immediate}]}].
>>>
>>> So, it worked.
>>>
>>>
>>>  Then did this:
>>>
>>> >>> import riak
>>> >>> c = riak.RiakClient(pb_port=8087, protocol='pbc', host='db-13')
>>> >>> b = c.bucket(name='locate', bucket_type='strongly_consistent')
>>> >>> o = b.get('foo')
>>> >>> o.data = 3
>>> >>> o.store()
>>> <riak.riak_object.RiakObject object at 0x2b2ce90>
>>> >>> o.delete()
>>> <riak.riak_object.RiakObject object at 0x2b2ce90>
>>> >>> b.delete('foo')
>>> <riak.riak_object.RiakObject object at 0x2b55d90>
>>> >>> o.exists
>>> False
>>> >>> b.get_keys()
>>> ['foo']
>>>
>>>
>>> So, it didn't work.
>>>
>>> It's not just the python client, because if I do this, I get the key
>>> back:
>>>
>>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys?keys=true
>>> {"keys":["foo"]}
>>>
>>>
>>>
>>> I've tried deleting the key via http request (curl -v -X DELETE
>>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys/bar),
>>> but it still remains.
>>>
>>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys/foo
>>>
>>> returns
>>>
>>> not found
>>>
>>> but
>>>
>>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys?keys=true
>>>
>>> gives
>>>
>>> {"keys":["foo","bar"]}
>>>
>>>
>>> I've tried looking for detailed logs, but console.log, even on debug,
>>> doesn't print anything useful.
>>> I've also tried looking inside bitcask directory, and there's definitely
>>> 'some' binary data there, even after deletion.
>>>
>>>
>>> On 19 May 2014 23:23, Dmitri Zagidulin <dzagidulin at basho.com> wrote:
>>>
>>>> Ah, that's interesting, let's see if we can test this.
>>>>
>>>> The 'delete_mode' configuration is not supported in the regular
>>>> riak.conf file, from what I understand.
>>>> However, you can still set it in the 'advanced.config' file, as
>>>> described here:
>>>>
>>>> https://github.com/basho/basho_docs/blob/features/lp/advanced-conf/source/languages/en/riak/ops/advanced/configs/configuration-files.md#the-advancedconfig-file
>>>> (those docs are a current work-in-progress, mind you)
>>>>
>>>> So, create an advanced.config file in your riak etc/ directory (this
>>>> will be in addition to your existing riak.conf), with the following
>>>> contents:
>>>> [
>>>>  {riak_kv, [
>>>>    {delete_mode, immediate}
>>>>  ]}
>>>> ].
>>>>
>>>> Restart the node, and try your tests again. The tombstones should
>>>> disappear now on every delete request. (You should probably also wipe all
>>>> of the old data, by deleting the contents of the bitcask and anti_entropy
>>>> directories in your riak data dir, just to make sure the old ones are gone.
>>>> This should be done while the node is down, of course.)
>>>>
>>>>
>>>>
>>>> On Mon, May 19, 2014 at 4:33 PM, Paweł Królikowski <rabbbit at gmail.com>wrote:
>>>>
>>>>> The problem is that the tombstones never disappear - they keep coming
>>>>> back through bucket.get_keys() hours after deletion, even after a restart.
>>>>>
>>>>> I said I'm using the delete_mode default configuration, because I
>>>>> didn't change it. I now tried, and apparently it's not supported any more
>>>>> in Riak 2.0.
>>>>>
>>>>> 17:16:56.318 [error] You've tried to set delete_mode, but there is no
>>>>> setting with that name.^M
>>>>> 17:16:56.318 [error]   Did you mean one of these?^M
>>>>> 17:16:56.335 [error]     dtrace^M
>>>>> 17:16:56.335 [error]     nodename^M
>>>>> 17:16:56.335 [error]     ssl.keyfile^M
>>>>> 17:16:56.335 [error] Error generating configuration in phase
>>>>> transform_datatypes^M
>>>>> 17:16:56.335 [error] Conf file attempted to set unknown variable:
>>>>> delete_mode^M
>>>>> Error generating config with cuttlefish
>>>>>
>>>>> I'm using Riak 2.0.0pre20, on strongly consistent buckets, on a single
>>>>> node cluster. Can this be the reason? I guess what I need is a confirmation
>>>>> that something is broken/that I'm doing something stupid.
>>>>>
>>>>> I've tried looking for similar issues (github.com/basho/riak/issues),
>>>>> didn't find any -> I guess that suggests I'm doing something stupid, I just
>>>>> don't know what yet.
>>>>>
>>>>>
>>>>> Thanks again :)
>>>>>
>>>>> --
>>>>> Paweł
>>>>>
>>>>>
>>>>> On 19 May 2014 18:00, Dmitri Zagidulin <dzagidulin at basho.com> wrote:
>>>>>
>>>>>> Ah, yes, you bring up a good point. (And, that's another subtlety to
>>>>>> keep in mind, with Option #1).
>>>>>>
>>>>>> Tombstones are definitely something to keep in mind, when deleting
>>>>>> unit test data.
>>>>>> As you mentioned in your earlier question, if you're using default
>>>>>> delete_mode configuration ( 3 seconds ), it means that if you issue a
>>>>>> delete, a tombstone object is going to be written (and stick around for at
>>>>>> least 3 seconds), and unfortunately, it is going to show up as a false
>>>>>> positive on a List Keys call.
>>>>>>
>>>>>> The easiest thing to try, in your case, is to set 'delete_mode' to
>>>>>> 'immediate', restart the test cluster, and retest. With an immediate
>>>>>> delete, your second test with 10 keys should not take as long as the
>>>>>> previous delete with 10000 keys.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 19, 2014 at 11:46 AM, Paweł Królikowski <
>>>>>> rabbbit at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Dmitri,
>>>>>>>
>>>>>>> Thanks a lot for the answer. Option #1 seems the best, but I have a
>>>>>>> follow up question:
>>>>>>>
>>>>>>> - when do the deleted keys disappear from Riak: a part of my problem
>>>>>>> (have not explained it correctly the first time), is that get_keys()
>>>>>>> returns keys that no longer exist. So, I run a test with 10 000 keys, I
>>>>>>> remove them, it takes Nseconds. I then follow with a test with 10 keys, but
>>>>>>> removing them takes just as much time - I imagine it's because I'm going
>>>>>>> over that 10 000 keys again.
>>>>>>>
>>>>>>> This article seems relevant:
>>>>>>> http://basho.com/riaks-config-behaviors-part-3/ - it seems like the
>>>>>>> tombstones simply remain in my system indefinitely.
>>>>>>>
>>>>>>> --
>>>>>>> Paweł
>>>>>>>
>>>>>>>
>>>>>>> On 19 May 2014 15:32, Dmitri Zagidulin <dzagidulin at basho.com> wrote:
>>>>>>>
>>>>>>>> Hi Pawel,
>>>>>>>>
>>>>>>>> There's basically three ways to clear data from Riak (for the
>>>>>>>> purposes of automated testing):
>>>>>>>>
>>>>>>>> 1. Iterate through the keys via get_keys(), and delete each one.
>>>>>>>> This is what you're currently doing, except you don't need to invoke
>>>>>>>> if.exists().
>>>>>>>> if.exists() makes an additional API call to Riak, and it takes
>>>>>>>> twice as long as just calling delete() (and trapping a potential 404
>>>>>>>> doesn't exist error).
>>>>>>>>
>>>>>>>> Advantages: Easy to understand, can be done entirely in code
>>>>>>>> (without invoking OS/shell commands).
>>>>>>>>
>>>>>>>> Disadvantages: It can get slow, for large data sets. Another subtle
>>>>>>>> disadvantage is that, as your app grows, it can get difficult to keep track
>>>>>>>> of which buckets you've created and need to be cleared.
>>>>>>>>
>>>>>>>> 2. Stop the Riak cluster, delete the riak data directory, and
>>>>>>>> re-start.
>>>>>>>>
>>>>>>>> Advantages: Very fast, and you can be sure that you're deleting all
>>>>>>>> buckets.
>>>>>>>>
>>>>>>>> Disadvantages: Involves invoking OS/shell commands. This is fairly
>>>>>>>> easy if your Riak node is running on the same machine as your tests (and if
>>>>>>>> it's a single node). To delete the data directories of a multi-node
>>>>>>>> cluster, now you need to involve either a bash script that uses SSH to log
>>>>>>>> in and restart, or a coordination framework like Ansible.
>>>>>>>>
>>>>>>>> 3. Use an in-memory back end. (And to drop all data, just restart
>>>>>>>> the node(s)).
>>>>>>>>
>>>>>>>> Advantages: Same as #2 - fast, thorough.
>>>>>>>>
>>>>>>>> Disadvantages: Same as #2 (involves shell commands, potentially SSH
>>>>>>>> etc). In addition, since you're likely not going to be running your
>>>>>>>> production code on an in-memory back end, this method introduces a
>>>>>>>> potential environmental/functional difference between your testing and
>>>>>>>> production clusters.
>>>>>>>>
>>>>>>>> I generally use method #1 in my unit tests, and manually delete
>>>>>>>> each key.
>>>>>>>>
>>>>>>>> Dmitri
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, May 19, 2014 at 8:53 AM, Paweł Królikowski <
>>>>>>>> rabbbit at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> For testing, I'd like to be able to throw a large number of data
>>>>>>>>> at Riak (100k+ entries), check how it performed, change something in the
>>>>>>>>> application, run the test again. I'd like to use the same data every time,
>>>>>>>>> so, I'd like to clear the bucket between every test.
>>>>>>>>>
>>>>>>>>> The documentation (
>>>>>>>>> http://docs.basho.com/riak/2.0.0beta1/dev/references/http/) says:
>>>>>>>>>
>>>>>>>>> *Delete Buckets*
>>>>>>>>> There is no straightforward way to delete an entire Bucket. To
>>>>>>>>> delete all the keys in a bucket, you’ll need to delete them all
>>>>>>>>> individually.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So, I'm currently using something like:
>>>>>>>>>
>>>>>>>>> for k in r_bk.get_keys():
>>>>>>>>> v = r_bk.get(k)
>>>>>>>>>  if v.exists:
>>>>>>>>> r_bk.delete(v)
>>>>>>>>>
>>>>>>>>> The problem is that r_bk.get_keys() returns a lot of elements that
>>>>>>>>> don't exist (tombstones?) and iterating over all of them takes time.
>>>>>>>>>
>>>>>>>>> Is that the way it's supposed to work? Or am I missing something?
>>>>>>>>>
>>>>>>>>> - I'm using default delete_mode configuration ( 3 seconds )
>>>>>>>>> - I'm using Riak 2.0 alpha 19 with Python. ( there's a bug with
>>>>>>>>> strong consistency in Beta1, cannot use it)
>>>>>>>>> - changing the bucket name for every run seems .. impractical?
>>>>>>>>>
>>>>>>>>> Any advices welcomed,
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks,
>>>>>>>>> Paweł
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> riak-users mailing list
>>>>>>>>> riak-users at lists.basho.com
>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
> --
> Sean Cribbs <sean at basho.com>
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140521/514642ed/attachment.html>


More information about the riak-users mailing list