Measuring Riak disk usage

Jeremiah Peschka jeremiah.peschka at gmail.com
Wed Apr 10 13:45:09 EDT 2013


If you've installed from the apt/yum repository you've installed a single
Riak node on your machine. Riak, though, is configured by default to write
data to three servers. If some of those servers aren't available, Riak is
going to write to a different server via hinted handoff[1]. Since you are
only running one node, that single node receives all copies of your data in
the hopes that some day the other Riak servers in the cluster will come
back for their data.

Matthew's recommendation (change app.config, delete your data, start over)
is going to be your best bet for a single system approach.


[1]:
http://docs.basho.com/riak/1.2.0/references/appendices/concepts/Riak-Glossary/#Hinted-Handoff

On Wednesday, April 10, 2013, Ben McCann wrote:

> Sure, will do. I'm still a little confused about how Riak runs on one
> machine though.  Is it running three server nodes or does it run only a
> single Riak node and store three copies of the data?
>
> Thanks,
> Ben
>
>
> On Wed, Apr 10, 2013 at 10:14 AM, Matthew Von-Maszewski <
> matthewv at basho.com> wrote:
>
> Ben,
>
> The runtime recovery log ends in "XXXXXX.log" where XXXXXX is a six digit
> numeric.  Its size will vary between 30Mbytes and 60Mbytes per vnode
> directory.no
>
> My recommendation is that you change the app.config file's
> default_bucket_props detailed below.  Completely erase the data storage
> area.  Then run again.  Should make a big total size difference.
>
> See if this results in more reasonable / comparable sizes.  This will give
> you a default compression comparison.  There is also a way to tune the
> database such that it would compress even more, but at the cost of random
> read performance.  We can try that next.
>
> Matthew
>
>
>
>
> On Apr 10, 2013, at 12:31 PM, Ben McCann <ben at benmccann.com> wrote:
>
> Thanks for the help. If I were saving three copies of the data in Riak
> that would certainly explain it! I installed Riak via the apt repository
> instructions<http://docs.basho.com/riak/1.1.4/tutorials/installation/Installing-on-Debian-and-Ubuntu/>.
> Not sure what that does by default. If it's saving three copies of the data
> then I assume it would also be running three server nodes or does it run
> only a single Riak node and store three copies of the data? I'm accessing
> Riak on port 8098, which seems to be the default if only one node is
> running.
>
> The first level of leveldb storage looks to be quite small to me. Is the
> runtime data recovery log likely to be very large? Can you tell me where
> that would be located or point me to some docs on it?
>
> I'm not super interested in squeezing out an extra percent or two of
> storage here or there, but just want to roughly have some idea if storing
> my data with snappy compression will yield me a 30% savings or 50% savings
> or 80% savings, etc. So any really big things like perhaps storing three
> copies of the data are interesting =)  In production, my average document
> size is probably about 2k and I have tens of millions and soon to be
> hundreds of millions of them.
>
> Thanks!
> -Ben
>
>
> On Wed, Apr 10, 2013 at 6:22 AM, Matthew Von-Maszewski <matthewv at basho.com
> > wrote:
>
> Greetings Ben,
>
> Also, leveldb stores data in "levels".  The very first storage level and
> the runtime data recovery log are not compressed.
>
> That said, I agree with Tom that you are most likely seeing Riak store 3
> copies of your data versus only one for mongodb.  It is possible to dumb
> down Riak so that it is closer to mongodb:
>
> 1.  in app.config, look for the riak_core options, add the following line:
>
>           {default_bucket_props, [{n_val,1}]},
>
> This will default the system to only storing one copy of your data.
>
>
> 2. if you are using Riak 1.3, again in app.config, look for the riak_kv
> options:
>
>     change this
>
>        {anti_entropy, {on, []}},
>
>     to
>
>       {anti_entropy, {off, []}},
>
> This will disable Riak's automatic detection and correction of data loss /
> corruption.  The feature requires an added 1 to 2% data on disk.
>
>
> Matthew
>
>
>
> On Apr 10, 2013, at 9:01 AM, Tom Santero <tsantero at basho.com> wrote:
>
> Hi Ben,
>
> First, allow me to welcome to the list! Stick around, I think you'll like
> it here. :)
>
> How many nodes of Riak are you running vs how many nodes of Mongo?
>
> How much more disk space did Riak take?
>
> Riak is designed to run as a cluster of several nodes, utilizing
> replication to provide resilien
>
> --
> about.me/benmccann
>


-- 
---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130410/9355eca2/attachment.html>


More information about the riak-users mailing list