Issues with capacity planning pages on wiki

Anthony Molinaro anthonym at alumni.caltech.edu
Tue May 24 01:53:29 EDT 2011


On Mon, May 23, 2011 at 09:57:25PM -0600, David Smith wrote:
> On Mon, May 23, 2011 at 9:39 PM, Anthony Molinaro
> <anthonym at alumni.caltech.edu> wrote:
> >
> >     Bitcask-Capacity-Planning  Cluster-Capacity-Planning  Reality
> > RAM   34.9 GB                     34.9 GB                   70 GB
> > Disk 102 GB                       18.49 GB                 341 GB
> >
> > So it looks to me like the numbers for RAM are about 1/2 of actual and
> > the number for Disk are completely off, they are different depending on
> > which page you look at on the wiki and vastly underestimate reality.
> 
> So RAM would require a little digging to figure out;

Anything I can do there to help?  I'd really like to get to the bottom
of the discrepency with these numbers.  I assume everything is stored
as binaries, and I'm not seeing some sort of 64-bit doubling (I know
I convert my keys and values to binaries before sending them to riak).

Here's the output of memory/0 on an attached shell

(riak at 10.1.1.31)1> memory().
[{total,7281790968},
 {processes,18543872},
 {processes_used,18132704},
 {system,7263247096},
 {atom,825105},
 {atom_used,815183},
 {binary,603512},
 {code,8306646},
 {ets,536440}]

Which seems like it's all used by system which is I assume the keydirs in
the driver.  Also does the number of partitions impact this value at all?
I have 1024 total on 8 nodes the ring currently looks like

ring_ownership : <<"[{'riak at 10.1.8.10',128},\n {'riak at 10.1.6.30',128},\n
{'riak at 10.1.10.20',129},\n {'riak at 10.1.1.31',129},\n {'riak at 10.1.2.32',128},\n
{'riak at 10.1.7.6',128},\n {'riak at 10.1.11.18',126},\n {'riak at 10.1.9.9',128}]">>

Which also seems a bit odd, I would expect them all to be 128, but anyway?

> disk is easier to
> explain. The disk calculations do not take into account (as best I can
> tell) the fact that bitcask is an append-only store and requires
> periodic merging/compaction of the on-disk files.

Is there anyway to force a merge/compaction so I can attempt to better
understand my usage.  I know with cassandra I had a way to run compactions
with their nodetool, but riak-admin doesn't seem to have any sort of
controls, unless a backup causes merging to occur.

> Thus, depending on
> your merge triggers, more space can be used than is strictly necessary
> to store the data.

So the lack of any overhead in the calculation is expected?  I mean
according to http://wiki.basho.com/Cluster-Capacity-Planning.html

Disk = Estimated Total Objects * Average Object Size * n_val

Which just seems wrong, doesn't it?  I don't quite understand the
bitcask code well enough yet to see what the actual data it stores is,
but the whitepaper suggested several things were involved in the on
disk representation.

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym at alumni.caltech.edu>




More information about the riak-users mailing list