Is storing billions of small files a good Riak-CS/KV usecase?

Dmitri Zagidulin dzagidulin at basho.com
Wed Oct 7 10:11:19 EDT 2015


Hi David,

1) Storing billions of small files is definitely a good use case for Riak
KV.  (Since they're small, there's no reason to use CS (now re-branded as
S2)).

2) As far as deleting an entire bucket, that part is tougher.

(Incidentally, if you were thinking of using Riak CS because it has a
'delete bucket' command (see
http://docs.basho.com/riakcs/latest/references/apis/storage/s3/RiakCS-DELETE-Bucket/
) -- that won't work, the delete bucket command requires all objects to be
deleted first. Meaning, you can only perform it on an empty bucket. Which
doesn't help you :) ).

Your best bet is to use the Bitcask back end (instead of leveldb), and use
its Automatic Expiration setting (see the end of the
http://docs.basho.com/riak/latest/ops/advanced/backends/bitcask/#Configuring-Bitcask
section, under 'Automatic Expiration').

So, you can say:

bitcask.expiry = 30d

And all of the objects (in all buckets using that backend) will be expired
30 days (from their last-modified timestamp). Which also effectively
deletes the bucket they were in.

Now, this setting is per-backend. So, if you need other buckets without
expirations, you'd want to set up a Multi backend. So, the default backend
could be a plain non-expiry Bitcask (or leveldb), and then a second backend
would have the expiry setting. You can learn more here:

http://docs.basho.com/riak/latest/ops/advanced/backends/multi/

What if you want to delete buckets but also use LevelDB?

That depends on why you want LevelDB. If you're using it for Secondary
Index capability -- you can use Search instead, that works with Bitcask.
Or, on the flip side, you can use 2i (secondary index) queries to delete
the bucket. You'd use a 2i query to get all the keys in an expiring bucket,
and then issue Deletes to each key. (Don't forget to delete with a W value
equal to your N value. Or you may have to run the query + delete a few
times, to account for stray missing replicas).

Does that help explain the situation?

Dmitri



On Wed, Oct 7, 2015 at 3:43 PM, David Heidt <david.heidt at msales.com> wrote:

>
> Hi List,
>
> would you say that storing billions of very small (json) files is a good
> usecase for riak kv or cs?
>
> here's what I would do:
>
> * create daily buckets ( i.e. 2015-10-07)
> * up to 130 Million inserts per day
> * about 150.000 read-ony accesses/day
> * no updates on existing keys/files
> * delete buckets (including keys/files) older than x days
>
>
> I already have a working riak-kv/leveldb cluster (inserts and lookups are
>  going smoothly), but when it comes to mass deletion of keys I found no way
> to do this.
>
>
> Best,
>
> David
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20151007/ead86e4b/attachment-0002.html>


More information about the riak-users mailing list