Issues with capacity planning pages on wiki

Anthony Molinaro anthonym at alumni.caltech.edu
Tue May 24 02:06:31 EDT 2011


On Mon, May 23, 2011 at 10:53:29PM -0700, Anthony Molinaro wrote:
> 
> On Mon, May 23, 2011 at 09:57:25PM -0600, David Smith wrote:
> > On Mon, May 23, 2011 at 9:39 PM, Anthony Molinaro
> > Thus, depending on
> > your merge triggers, more space can be used than is strictly necessary
> > to store the data.
> 
> So the lack of any overhead in the calculation is expected?  I mean
> according to http://wiki.basho.com/Cluster-Capacity-Planning.html
> 
> Disk = Estimated Total Objects * Average Object Size * n_val
> 
> Which just seems wrong, doesn't it?  I don't quite understand the
> bitcask code well enough yet to see what the actual data it stores is,
> but the whitepaper suggested several things were involved in the on
> disk representation.

Okay, finally found the code for this part, I kept looking in the nif
but that's only the keydir, not the data files.  It looks like

   %% Setup io_list for writing -- avoid merging binaries if we can help it
   Bytes0 = [<<Tstamp:?TSTAMPFIELD>>, <<KeySz:?KEYSIZEFIELD>>,
             <<ValueSz:?VALSIZEFIELD>>, Key, Value],
   Bytes  = [<<(erlang:crc32(Bytes0)):?CRCSIZEFIELD>> | Bytes0],

And looking at the header, it seems that there's 14 bytes of overhead
(4 for CRC, 4 for timestamp, 2 for keysize, 4 for valsize).

So disk calculation should be

( 14 + Key + Value ) * Num Entries * N_Val

So using my numbers from before that gives

( 14 + 36 + 36 ) * 183915891 * 3 = 47450299878 = 44.1 GB

which actually isn't much closer to 341 GB than the previous calculation :(

So all my questions from the previous email still apply.

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym at alumni.caltech.edu>




More information about the riak-users mailing list