Is storing billions of small files a good Riak-CS/KV usecase?

Dmitri Zagidulin dzagidulin at basho.com
Wed Oct 7 10:23:41 EDT 2015


On second thought, ignore the Search recommendation. Search + Expiry
doesn't work very well (when objects expire from Riak, their search index
entries persist, except now those are orphaned).

On Wed, Oct 7, 2015 at 4:11 PM, Dmitri Zagidulin <dzagidulin at basho.com>
wrote:

> Hi David,
>
> 1) Storing billions of small files is definitely a good use case for Riak
> KV.  (Since they're small, there's no reason to use CS (now re-branded as
> S2)).
>
> 2) As far as deleting an entire bucket, that part is tougher.
>
> (Incidentally, if you were thinking of using Riak CS because it has a
> 'delete bucket' command (see
> http://docs.basho.com/riakcs/latest/references/apis/storage/s3/RiakCS-DELETE-Bucket/
> ) -- that won't work, the delete bucket command requires all objects to be
> deleted first. Meaning, you can only perform it on an empty bucket. Which
> doesn't help you :) ).
>
> Your best bet is to use the Bitcask back end (instead of leveldb), and use
> its Automatic Expiration setting (see the end of the
> http://docs.basho.com/riak/latest/ops/advanced/backends/bitcask/#Configuring-Bitcask
> section, under 'Automatic Expiration').
>
> So, you can say:
>
> bitcask.expiry = 30d
>
> And all of the objects (in all buckets using that backend) will be expired
> 30 days (from their last-modified timestamp). Which also effectively
> deletes the bucket they were in.
>
> Now, this setting is per-backend. So, if you need other buckets without
> expirations, you'd want to set up a Multi backend. So, the default backend
> could be a plain non-expiry Bitcask (or leveldb), and then a second backend
> would have the expiry setting. You can learn more here:
>
> http://docs.basho.com/riak/latest/ops/advanced/backends/multi/
>
> What if you want to delete buckets but also use LevelDB?
>
> That depends on why you want LevelDB. If you're using it for Secondary
> Index capability -- you can use Search instead, that works with Bitcask.
> Or, on the flip side, you can use 2i (secondary index) queries to delete
> the bucket. You'd use a 2i query to get all the keys in an expiring bucket,
> and then issue Deletes to each key. (Don't forget to delete with a W value
> equal to your N value. Or you may have to run the query + delete a few
> times, to account for stray missing replicas).
>
> Does that help explain the situation?
>
> Dmitri
>
>
>
> On Wed, Oct 7, 2015 at 3:43 PM, David Heidt <david.heidt at msales.com>
> wrote:
>
>>
>> Hi List,
>>
>> would you say that storing billions of very small (json) files is a good
>> usecase for riak kv or cs?
>>
>> here's what I would do:
>>
>> * create daily buckets ( i.e. 2015-10-07)
>> * up to 130 Million inserts per day
>> * about 150.000 read-ony accesses/day
>> * no updates on existing keys/files
>> * delete buckets (including keys/files) older than x days
>>
>>
>> I already have a working riak-kv/leveldb cluster (inserts and lookups are
>>  going smoothly), but when it comes to mass deletion of keys I found no way
>> to do this.
>>
>>
>> Best,
>>
>> David
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20151007/64d359b3/attachment-0002.html>


More information about the riak-users mailing list