The best (fastest) way to delete/clear a bucket [python]

Dmitri Zagidulin dzagidulin at basho.com
Mon May 19 10:32:13 EDT 2014


Hi Pawel,

There's basically three ways to clear data from Riak (for the purposes of
automated testing):

1. Iterate through the keys via get_keys(), and delete each one. This is
what you're currently doing, except you don't need to invoke if.exists().
if.exists() makes an additional API call to Riak, and it takes twice as
long as just calling delete() (and trapping a potential 404 doesn't exist
error).

Advantages: Easy to understand, can be done entirely in code (without
invoking OS/shell commands).

Disadvantages: It can get slow, for large data sets. Another subtle
disadvantage is that, as your app grows, it can get difficult to keep track
of which buckets you've created and need to be cleared.

2. Stop the Riak cluster, delete the riak data directory, and re-start.

Advantages: Very fast, and you can be sure that you're deleting all buckets.

Disadvantages: Involves invoking OS/shell commands. This is fairly easy if
your Riak node is running on the same machine as your tests (and if it's a
single node). To delete the data directories of a multi-node cluster, now
you need to involve either a bash script that uses SSH to log in and
restart, or a coordination framework like Ansible.

3. Use an in-memory back end. (And to drop all data, just restart the
node(s)).

Advantages: Same as #2 - fast, thorough.

Disadvantages: Same as #2 (involves shell commands, potentially SSH etc).
In addition, since you're likely not going to be running your production
code on an in-memory back end, this method introduces a potential
environmental/functional difference between your testing and production
clusters.

I generally use method #1 in my unit tests, and manually delete each key.

Dmitri



On Mon, May 19, 2014 at 8:53 AM, Paweł Królikowski <rabbbit at gmail.com>wrote:

> Hi,
>
> For testing, I'd like to be able to throw a large number of data at Riak
> (100k+ entries), check how it performed, change something in the
> application, run the test again. I'd like to use the same data every time,
> so, I'd like to clear the bucket between every test.
>
> The documentation (
> http://docs.basho.com/riak/2.0.0beta1/dev/references/http/) says:
>
> *Delete Buckets*
> There is no straightforward way to delete an entire Bucket. To delete all
> the keys in a bucket, you’ll need to delete them all individually.
>
>
> So, I'm currently using something like:
>
> for k in r_bk.get_keys():
> v = r_bk.get(k)
>  if v.exists:
> r_bk.delete(v)
>
> The problem is that r_bk.get_keys() returns a lot of elements that don't
> exist (tombstones?) and iterating over all of them takes time.
>
> Is that the way it's supposed to work? Or am I missing something?
>
> - I'm using default delete_mode configuration ( 3 seconds )
> - I'm using Riak 2.0 alpha 19 with Python. ( there's a bug with strong
> consistency in Beta1, cannot use it)
> - changing the bucket name for every run seems .. impractical?
>
> Any advices welcomed,
>
> --
> Thanks,
> Paweł
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140519/56e021a1/attachment.html>


More information about the riak-users mailing list