Issues with capacity planning pages on wiki

Anthony Molinaro anthonym at
Tue May 24 01:53:29 EDT 2011

On Mon, May 23, 2011 at 09:57:25PM -0600, David Smith wrote:
> On Mon, May 23, 2011 at 9:39 PM, Anthony Molinaro
> <anthonym at> wrote:
> >
> >     Bitcask-Capacity-Planning  Cluster-Capacity-Planning  Reality
> > RAM   34.9 GB                     34.9 GB                   70 GB
> > Disk 102 GB                       18.49 GB                 341 GB
> >
> > So it looks to me like the numbers for RAM are about 1/2 of actual and
> > the number for Disk are completely off, they are different depending on
> > which page you look at on the wiki and vastly underestimate reality.
> So RAM would require a little digging to figure out;

Anything I can do there to help?  I'd really like to get to the bottom
of the discrepency with these numbers.  I assume everything is stored
as binaries, and I'm not seeing some sort of 64-bit doubling (I know
I convert my keys and values to binaries before sending them to riak).

Here's the output of memory/0 on an attached shell

(riak at> memory().

Which seems like it's all used by system which is I assume the keydirs in
the driver.  Also does the number of partitions impact this value at all?
I have 1024 total on 8 nodes the ring currently looks like

ring_ownership : <<"[{'riak at',128},\n {'riak at',128},\n
{'riak at',129},\n {'riak at',129},\n {'riak at',128},\n
{'riak at',128},\n {'riak at',126},\n {'riak at',128}]">>

Which also seems a bit odd, I would expect them all to be 128, but anyway?

> disk is easier to
> explain. The disk calculations do not take into account (as best I can
> tell) the fact that bitcask is an append-only store and requires
> periodic merging/compaction of the on-disk files.

Is there anyway to force a merge/compaction so I can attempt to better
understand my usage.  I know with cassandra I had a way to run compactions
with their nodetool, but riak-admin doesn't seem to have any sort of
controls, unless a backup causes merging to occur.

> Thus, depending on
> your merge triggers, more space can be used than is strictly necessary
> to store the data.

So the lack of any overhead in the calculation is expected?  I mean
according to

Disk = Estimated Total Objects * Average Object Size * n_val

Which just seems wrong, doesn't it?  I don't quite understand the
bitcask code well enough yet to see what the actual data it stores is,
but the whitepaper suggested several things were involved in the on
disk representation.


Anthony Molinaro                           <anthonym at>

More information about the riak-users mailing list