Is storing billions of small files a good Riak-CS/KV usecase?

David Heidt david.heidt at msales.com
Wed Oct 7 10:36:54 EDT 2015


Thank you so much Daniel and Dmitri!

I will benchmark 2i against bitcask expiry and get back to you here in a
couple of days.

Best,

David


2015-10-07 16:23 GMT+02:00 Dmitri Zagidulin <dzagidulin at basho.com>:

> On second thought, ignore the Search recommendation. Search + Expiry
> doesn't work very well (when objects expire from Riak, their search index
> entries persist, except now those are orphaned).
>
> On Wed, Oct 7, 2015 at 4:11 PM, Dmitri Zagidulin <dzagidulin at basho.com>
> wrote:
>
>> Hi David,
>>
>> 1) Storing billions of small files is definitely a good use case for Riak
>> KV.  (Since they're small, there's no reason to use CS (now re-branded as
>> S2)).
>>
>> 2) As far as deleting an entire bucket, that part is tougher.
>>
>> (Incidentally, if you were thinking of using Riak CS because it has a
>> 'delete bucket' command (see
>> http://docs.basho.com/riakcs/latest/references/apis/storage/s3/RiakCS-DELETE-Bucket/
>> ) -- that won't work, the delete bucket command requires all objects to be
>> deleted first. Meaning, you can only perform it on an empty bucket. Which
>> doesn't help you :) ).
>>
>> Your best bet is to use the Bitcask back end (instead of leveldb), and
>> use its Automatic Expiration setting (see the end of the
>> http://docs.basho.com/riak/latest/ops/advanced/backends/bitcask/#Configuring-Bitcask
>> section, under 'Automatic Expiration').
>>
>> So, you can say:
>>
>> bitcask.expiry = 30d
>>
>> And all of the objects (in all buckets using that backend) will be
>> expired 30 days (from their last-modified timestamp). Which also
>> effectively deletes the bucket they were in.
>>
>> Now, this setting is per-backend. So, if you need other buckets without
>> expirations, you'd want to set up a Multi backend. So, the default backend
>> could be a plain non-expiry Bitcask (or leveldb), and then a second backend
>> would have the expiry setting. You can learn more here:
>>
>> http://docs.basho.com/riak/latest/ops/advanced/backends/multi/
>>
>> What if you want to delete buckets but also use LevelDB?
>>
>> That depends on why you want LevelDB. If you're using it for Secondary
>> Index capability -- you can use Search instead, that works with Bitcask.
>> Or, on the flip side, you can use 2i (secondary index) queries to delete
>> the bucket. You'd use a 2i query to get all the keys in an expiring bucket,
>> and then issue Deletes to each key. (Don't forget to delete with a W value
>> equal to your N value. Or you may have to run the query + delete a few
>> times, to account for stray missing replicas).
>>
>> Does that help explain the situation?
>>
>> Dmitri
>>
>>
>>
>> On Wed, Oct 7, 2015 at 3:43 PM, David Heidt <david.heidt at msales.com>
>> wrote:
>>
>>>
>>> Hi List,
>>>
>>> would you say that storing billions of very small (json) files is a good
>>> usecase for riak kv or cs?
>>>
>>> here's what I would do:
>>>
>>> * create daily buckets ( i.e. 2015-10-07)
>>> * up to 130 Million inserts per day
>>> * about 150.000 read-ony accesses/day
>>> * no updates on existing keys/files
>>> * delete buckets (including keys/files) older than x days
>>>
>>>
>>> I already have a working riak-kv/leveldb cluster (inserts and lookups
>>> are  going smoothly), but when it comes to mass deletion of keys I found no
>>> way to do this.
>>>
>>>
>>> Best,
>>>
>>> David
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20151007/bb2d8533/attachment-0002.html>


More information about the riak-users mailing list