Measuring Riak disk usage

Ben McCann ben at benmccann.com
Wed Apr 10 13:36:22 EDT 2013


Sure, will do. I'm still a little confused about how Riak runs on one
machine though.  Is it running three server nodes or does it run only a
single Riak node and store three copies of the data?

Thanks,
Ben


On Wed, Apr 10, 2013 at 10:14 AM, Matthew Von-Maszewski
<matthewv at basho.com>wrote:

> Ben,
>
> The runtime recovery log ends in "XXXXXX.log" where XXXXXX is a six digit
> numeric.  Its size will vary between 30Mbytes and 60Mbytes per vnode
> directory.no
>
> My recommendation is that you change the app.config file's
> default_bucket_props detailed below.  Completely erase the data storage
> area.  Then run again.  Should make a big total size difference.
>
> See if this results in more reasonable / comparable sizes.  This will give
> you a default compression comparison.  There is also a way to tune the
> database such that it would compress even more, but at the cost of random
> read performance.  We can try that next.
>
> Matthew
>
>
>
>
> On Apr 10, 2013, at 12:31 PM, Ben McCann <ben at benmccann.com> wrote:
>
> Thanks for the help. If I were saving three copies of the data in Riak
> that would certainly explain it! I installed Riak via the apt repository
> instructions<http://docs.basho.com/riak/1.1.4/tutorials/installation/Installing-on-Debian-and-Ubuntu/>.
> Not sure what that does by default. If it's saving three copies of the data
> then I assume it would also be running three server nodes or does it run
> only a single Riak node and store three copies of the data? I'm accessing
> Riak on port 8098, which seems to be the default if only one node is
> running.
>
> The first level of leveldb storage looks to be quite small to me. Is the
> runtime data recovery log likely to be very large? Can you tell me where
> that would be located or point me to some docs on it?
>
> I'm not super interested in squeezing out an extra percent or two of
> storage here or there, but just want to roughly have some idea if storing
> my data with snappy compression will yield me a 30% savings or 50% savings
> or 80% savings, etc. So any really big things like perhaps storing three
> copies of the data are interesting =)  In production, my average document
> size is probably about 2k and I have tens of millions and soon to be
> hundreds of millions of them.
>
> Thanks!
> -Ben
>
>
> On Wed, Apr 10, 2013 at 6:22 AM, Matthew Von-Maszewski <matthewv at basho.com
> > wrote:
>
>> Greetings Ben,
>>
>> Also, leveldb stores data in "levels".  The very first storage level and
>> the runtime data recovery log are not compressed.
>>
>> That said, I agree with Tom that you are most likely seeing Riak store 3
>> copies of your data versus only one for mongodb.  It is possible to dumb
>> down Riak so that it is closer to mongodb:
>>
>> 1.  in app.config, look for the riak_core options, add the following line:
>>
>>           {default_bucket_props, [{n_val,1}]},
>>
>> This will default the system to only storing one copy of your data.
>>
>>
>> 2. if you are using Riak 1.3, again in app.config, look for the riak_kv
>> options:
>>
>>     change this
>>
>>        {anti_entropy, {on, []}},
>>
>>     to
>>
>>       {anti_entropy, {off, []}},
>>
>> This will disable Riak's automatic detection and correction of data loss
>> / corruption.  The feature requires an added 1 to 2% data on disk.
>>
>>
>> Matthew
>>
>>
>>
>> On Apr 10, 2013, at 9:01 AM, Tom Santero <tsantero at basho.com> wrote:
>>
>> Hi Ben,
>>
>> First, allow me to welcome to the list! Stick around, I think you'll like
>> it here. :)
>>
>> How many nodes of Riak are you running vs how many nodes of Mongo?
>>
>> How much more disk space did Riak take?
>>
>> Riak is designed to run as a cluster of several nodes, utilizing
>> replication to provide resiliency and high-availability during partial
>> failure. By default Riak stores three replicas of every object you persist.
>> If you are only running a single node of Riak for your testing purposes, I
>> suspect this may explain the significant divergence you're seeing when
>> compared to the disk space used vs a single mongo, as each replica in Riak
>> is being stored to the same disk.
>>
>> Also, Snappy is optimizes for speed over disk utility, which will have a
>> negligible impact on total disk usage when compared to other compression
>> libraries such as zlib, etc. That said, for sufficiently large JSON files I
>> know that BSON's prefixes can add significant overhead to object sizes such
>> that BSON is actually heavier than the JSON it represents. What is the
>> average size of the documents you're seeking to store?
>>
>> Could you tell us a bit more about what you're trying to achieve with
>> both Riak and Mongo, respectfully?
>>
>> Tom
>>
>> On Wed, Apr 10, 2013 at 12:39 AM, Ben McCann <ben at benmccann.com> wrote:
>>
>>> Hi,
>>>
>>> I'm currently storing data in MongoDB and would like to evaluate Riak as
>>> an alternative. Riak is appealing to me because LevelDB uses Snappy, so I
>>> would expect it to take less disk space to store my data set than MongoDB
>>> which does not use compression. However, when I benchmarked it by inserting
>>> a few hundred thousand JSON records into each datastore, Riak in fact took
>>> far more disk space. I'm wondering if there's something I might be missing
>>> here as a newcomer to Riak. E.g. I checked the disk space used by running
>>> "du -ch /var/lib/riak/leveldb". Is this perhaps not a good way to check
>>> disk space usage because perhaps Riak/LevelDB preallocates files? (I know
>>> MongoDB does this and has a built-in db.collection.stats command to provide
>>> true disk usage information). Are there any other reasons why Riak might be
>>> taking more space or anything I could have screwed up?
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> about.me/benmccann
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>
>
> --
> about.me/benmccann
>
>
>


-- 
about.me/benmccann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130410/9023aede/attachment.html>


More information about the riak-users mailing list