Is storing billions of small files a good Riak-CS/KV usecase?

Dmitri Zagidulin dzagidulin at
Wed Oct 7 10:11:19 EDT 2015

Hi David,

1) Storing billions of small files is definitely a good use case for Riak
KV.  (Since they're small, there's no reason to use CS (now re-branded as

2) As far as deleting an entire bucket, that part is tougher.

(Incidentally, if you were thinking of using Riak CS because it has a
'delete bucket' command (see
) -- that won't work, the delete bucket command requires all objects to be
deleted first. Meaning, you can only perform it on an empty bucket. Which
doesn't help you :) ).

Your best bet is to use the Bitcask back end (instead of leveldb), and use
its Automatic Expiration setting (see the end of the
section, under 'Automatic Expiration').

So, you can say:

bitcask.expiry = 30d

And all of the objects (in all buckets using that backend) will be expired
30 days (from their last-modified timestamp). Which also effectively
deletes the bucket they were in.

Now, this setting is per-backend. So, if you need other buckets without
expirations, you'd want to set up a Multi backend. So, the default backend
could be a plain non-expiry Bitcask (or leveldb), and then a second backend
would have the expiry setting. You can learn more here:

What if you want to delete buckets but also use LevelDB?

That depends on why you want LevelDB. If you're using it for Secondary
Index capability -- you can use Search instead, that works with Bitcask.
Or, on the flip side, you can use 2i (secondary index) queries to delete
the bucket. You'd use a 2i query to get all the keys in an expiring bucket,
and then issue Deletes to each key. (Don't forget to delete with a W value
equal to your N value. Or you may have to run the query + delete a few
times, to account for stray missing replicas).

Does that help explain the situation?


On Wed, Oct 7, 2015 at 3:43 PM, David Heidt <david.heidt at> wrote:

> Hi List,
> would you say that storing billions of very small (json) files is a good
> usecase for riak kv or cs?
> here's what I would do:
> * create daily buckets ( i.e. 2015-10-07)
> * up to 130 Million inserts per day
> * about 150.000 read-ony accesses/day
> * no updates on existing keys/files
> * delete buckets (including keys/files) older than x days
> I already have a working riak-kv/leveldb cluster (inserts and lookups are
>  going smoothly), but when it comes to mass deletion of keys I found no way
> to do this.
> Best,
> David
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list