Measuring Riak disk usage

Ben McCann ben at benmccann.com
Wed Apr 10 12:31:14 EDT 2013


Thanks for the help. If I were saving three copies of the data in Riak that
would certainly explain it! I installed Riak via the apt repository
instructions<http://docs.basho.com/riak/1.1.4/tutorials/installation/Installing-on-Debian-and-Ubuntu/>.
Not sure what that does by default. If it's saving three copies of the data
then I assume it would also be running three server nodes or does it run
only a single Riak node and store three copies of the data? I'm accessing
Riak on port 8098, which seems to be the default if only one node is
running.

The first level of leveldb storage looks to be quite small to me. Is the
runtime data recovery log likely to be very large? Can you tell me where
that would be located or point me to some docs on it?

I'm not super interested in squeezing out an extra percent or two of
storage here or there, but just want to roughly have some idea if storing
my data with snappy compression will yield me a 30% savings or 50% savings
or 80% savings, etc. So any really big things like perhaps storing three
copies of the data are interesting =)  In production, my average document
size is probably about 2k and I have tens of millions and soon to be
hundreds of millions of them.

Thanks!
-Ben


On Wed, Apr 10, 2013 at 6:22 AM, Matthew Von-Maszewski
<matthewv at basho.com>wrote:

> Greetings Ben,
>
> Also, leveldb stores data in "levels".  The very first storage level and
> the runtime data recovery log are not compressed.
>
> That said, I agree with Tom that you are most likely seeing Riak store 3
> copies of your data versus only one for mongodb.  It is possible to dumb
> down Riak so that it is closer to mongodb:
>
> 1.  in app.config, look for the riak_core options, add the following line:
>
>           {default_bucket_props, [{n_val,1}]},
>
> This will default the system to only storing one copy of your data.
>
>
> 2. if you are using Riak 1.3, again in app.config, look for the riak_kv
> options:
>
>     change this
>
>        {anti_entropy, {on, []}},
>
>     to
>
>       {anti_entropy, {off, []}},
>
> This will disable Riak's automatic detection and correction of data loss /
> corruption.  The feature requires an added 1 to 2% data on disk.
>
>
> Matthew
>
>
>
> On Apr 10, 2013, at 9:01 AM, Tom Santero <tsantero at basho.com> wrote:
>
> Hi Ben,
>
> First, allow me to welcome to the list! Stick around, I think you'll like
> it here. :)
>
> How many nodes of Riak are you running vs how many nodes of Mongo?
>
> How much more disk space did Riak take?
>
> Riak is designed to run as a cluster of several nodes, utilizing
> replication to provide resiliency and high-availability during partial
> failure. By default Riak stores three replicas of every object you persist.
> If you are only running a single node of Riak for your testing purposes, I
> suspect this may explain the significant divergence you're seeing when
> compared to the disk space used vs a single mongo, as each replica in Riak
> is being stored to the same disk.
>
> Also, Snappy is optimizes for speed over disk utility, which will have a
> negligible impact on total disk usage when compared to other compression
> libraries such as zlib, etc. That said, for sufficiently large JSON files I
> know that BSON's prefixes can add significant overhead to object sizes such
> that BSON is actually heavier than the JSON it represents. What is the
> average size of the documents you're seeking to store?
>
> Could you tell us a bit more about what you're trying to achieve with both
> Riak and Mongo, respectfully?
>
> Tom
>
> On Wed, Apr 10, 2013 at 12:39 AM, Ben McCann <ben at benmccann.com> wrote:
>
>> Hi,
>>
>> I'm currently storing data in MongoDB and would like to evaluate Riak as
>> an alternative. Riak is appealing to me because LevelDB uses Snappy, so I
>> would expect it to take less disk space to store my data set than MongoDB
>> which does not use compression. However, when I benchmarked it by inserting
>> a few hundred thousand JSON records into each datastore, Riak in fact took
>> far more disk space. I'm wondering if there's something I might be missing
>> here as a newcomer to Riak. E.g. I checked the disk space used by running
>> "du -ch /var/lib/riak/leveldb". Is this perhaps not a good way to check
>> disk space usage because perhaps Riak/LevelDB preallocates files? (I know
>> MongoDB does this and has a built-in db.collection.stats command to provide
>> true disk usage information). Are there any other reasons why Riak might be
>> taking more space or anything I could have screwed up?
>>
>> Thanks,
>> Ben
>>
>> --
>> about.me/benmccann
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>


-- 
about.me/benmccann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130410/2c361b47/attachment.html>


More information about the riak-users mailing list