Issues with capacity planning pages on wiki

Anthony Molinaro anthonym at
Mon May 23 23:39:01 EDT 2011


  As I'm about to dramatically increase our riak investment by putting
lots more data into it.  I figured I might try to run through the
capacity planning on the wiki.

Since my current setup is fairly small and manageable I decided to try
to see how accurately the capacity planning matches what I see.

So first reality

  A: Number of Machine    : 8
  B: Memory per Machine   : 24 GB
  C: Length of Bucket Name: 10 bytes
  D: Length of Keys       : 36 bytes
  E: Length of Values     : 36 bytes
  F: Replication Factor   : 3
  G: Number of Keys       : 183915891
  H: Disk Space used      : 341898018816 bytes (341 GB)
  I: RAM                  : 70536691712 bytes (70 GB)

  G was calculated using riak_kv_bitcask_backend:key_counts/0 for
    each bitcask on a node, summing, then dividing by 3
  H was calculated with 'du -sk /var/lib/riak/bitcask/ | cut -f1', summing
    and multiplying by 1024
  I was caluclated with 'ps -U riak -o vsz h', summing and multiplying
    by 1024

Now from entering A-G on the Bitcask-Capacity-Planning page I get

  Total Key Space: 34.9 GB
  Node Count : 3 (7 GB Storage per Node)

in the first section and

  Key Overhead: 73 Bytes (22 Byte Overhead)
  Total Documents: 1,010,580,541
  Total Disk Used: 102 GB of Disk Space

Also when using the Cluster Capacity Planning page I get

  (static bitcask per key overhead
   + estimated average bucket+key length in bytes)
   * estimate total number of keys
   * n_val
   = Approximate RAM Needed for Bitcask

So plugging in values

  ( 22 + 10 + 36 ) * 183915891 * 3 = 37518841696 = 34.9 GB


  Disk = Estimated Total Objects * Average Object Size * n_val
  Disk = 183915891 * 36 * 3 = 19862916228 = 18.49 GB

So either the equations are drastically wrong or my calculations are.  I find
it very suspect that the equation for the amount of disk includes zero
overhead when reading the bitcask paper it seems like each entry consists

  CRC, timestamp, keysz, valsz, key, value

Well anyway, there's obviously something off, as I end up with the following

      Bitcask-Capacity-Planning     Cluster-Capacity-Planning   Reality
RAM            34.9 GB                        34.9 GB             70 GB
Disk          102 GB                          18.49 GB           341 GB

So it looks to me like the numbers for RAM are about 1/2 of actual and
the number for Disk are completely off, they are different depending on
which page you look at on the wiki and vastly underestimate reality.

I'm hoping someone from basho can clarify so I can really determine



Anthony Molinaro                           <anthonym at>

More information about the riak-users mailing list