Issues with capacity planning pages on wiki

Anthony Molinaro anthonym at alumni.caltech.edu
Tue May 24 17:48:30 EDT 2011


Just curious if anyone has any ideas, for the moment, I'm just taking
the RAM calculation and multiplying by 2 and the Disk calculation and
multiplying by 8, based on my findings with my current cluster.  But
I would like to know why my values are so much higher than those I should
be getting.

Also, I'd still like to know how the forms calculate things as the disk
calculation there does not match reality or the formula.

Also, waiting to hear if there is any way to force merge to run so I can
more accurately gauge whether multiple copies are effecting disk usage.

Thanks,

-Anthony

On Mon, May 23, 2011 at 11:06:31PM -0700, Anthony Molinaro wrote:
> 
> On Mon, May 23, 2011 at 10:53:29PM -0700, Anthony Molinaro wrote:
> > 
> > On Mon, May 23, 2011 at 09:57:25PM -0600, David Smith wrote:
> > > On Mon, May 23, 2011 at 9:39 PM, Anthony Molinaro
> > > Thus, depending on
> > > your merge triggers, more space can be used than is strictly necessary
> > > to store the data.
> > 
> > So the lack of any overhead in the calculation is expected?  I mean
> > according to http://wiki.basho.com/Cluster-Capacity-Planning.html
> > 
> > Disk = Estimated Total Objects * Average Object Size * n_val
> > 
> > Which just seems wrong, doesn't it?  I don't quite understand the
> > bitcask code well enough yet to see what the actual data it stores is,
> > but the whitepaper suggested several things were involved in the on
> > disk representation.
> 
> Okay, finally found the code for this part, I kept looking in the nif
> but that's only the keydir, not the data files.  It looks like
> 
>    %% Setup io_list for writing -- avoid merging binaries if we can help it
>    Bytes0 = [<<Tstamp:?TSTAMPFIELD>>, <<KeySz:?KEYSIZEFIELD>>,
>              <<ValueSz:?VALSIZEFIELD>>, Key, Value],
>    Bytes  = [<<(erlang:crc32(Bytes0)):?CRCSIZEFIELD>> | Bytes0],
> 
> And looking at the header, it seems that there's 14 bytes of overhead
> (4 for CRC, 4 for timestamp, 2 for keysize, 4 for valsize).
> 
> So disk calculation should be
> 
> ( 14 + Key + Value ) * Num Entries * N_Val
> 
> So using my numbers from before that gives
> 
> ( 14 + 36 + 36 ) * 183915891 * 3 = 47450299878 = 44.1 GB
> 
> which actually isn't much closer to 341 GB than the previous calculation :(
> 
> So all my questions from the previous email still apply.
> 
> -Anthony
> 
> -- 
> ------------------------------------------------------------------------
> Anthony Molinaro                           <anthonym at alumni.caltech.edu>

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym at alumni.caltech.edu>




More information about the riak-users mailing list