bitcask and innostore overheads

Sean Cribbs sean at basho.com
Tue Jul 6 16:18:05 EDT 2010


Jeremy,

I'm glad to see you're still looking at Riak.

Regarding your bitcask question, that does seem to be in the correct range of sizes.  Dave (@dizzyco) tells me the actual figure is 24 bytes + the hashtable overhead.

Inno does pad things to fixed-size pages, so yes, you could end up with wasted disk.  However, I would suspect the greater concern would be excessive invalidation of the buffer pool from the essentially random/uniform shape of your key-space, making it difficult to get good throughput.  Inno works best when keys are inserted in sequential order.

Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Jul 6, 2010, at 3:36 PM, Jeremy Hinegardner wrote:

> Hi all,
> 
> I am doing some sizing estimates for a possible transition to riak of
> our document store.  I've mentioned it before on this list before and
> in #riak and this is a snippet of a conversation I had with @seancribbs:
> 
>    https://gist.github.com/c3838e5c421d6ab21c93
> 
> I have also reviewed the bitcask-intro.pdf and http://gist.github.com/438065
> 
> Quick and dirty info, I am looking to store billions of documents, starting
> with 2 billion initially, and a linear growth of around 10 million per day.
> 
> The key is a 64bit number as a string (generaly about 20 bytes) and the value is
> a text/xml document of an average size of 1.5KiB.  This size long tails out to
> maybe 5 MiB.  
> 
> Our system is write once. A key/value pair should never be overwritten once it
> is initially inserted, and it is accessed fairly often for about a day, and then
> a long tail drop off.  The pair must be available for retrieval at any time.
> 
> == Bitcask ==
> 
> I went into the source of bitcask to confirm the 32 bytes per key minimum
> memory requirements mentioned in http://gist.github.com/438065 and turned
> up:
> 
>    http://github.com/basho/bitcask/blob/master/c_src/bitcask_nifs.c#L37
> 
> If my calculations are correct, the actual memory overhead, per key using 
> bitcask is 72+N bytes on a 64bit system:
> 
>    UT_hash_handle ->  50 bytes, (6 pointers and 2 chars)
>    file_id        ->   4 bytes,
>    total_sz       ->   4 bytes,
>    offset         ->   8 bytes,
>    tstamp         ->   4 bytes,
>    key_sz         ->   2 bytes,
>    key            ->   N bytes - how big is this?  is this the riak key, 
>                                  or a hash of the riak key?
> 
> This adds up to 72 bytes + the size of the key, per key/value in bitcask.
> 
> If I assume that the key is 20 bytes, then we are talking 92 bytes of memory
> overhead per document. That means I can store, ~11 million documents per GiB of
> free memory (1024^3 / 92),  Or, if I have 32GiB of free ram on a machine
> to dedicate to riak w/bitcask (the rest would be used for diskcache) I
> can store ~373 Million documents.
> 
> Are my calculations correct?
> 
> It also does not look like bitcask pads values on disk, so there is no wasted
> disk space.  Is this correct?
> 
> == Innostore ==
> 
> For Innostore I'm not so worried about the memory overhead as insertion overhead
> and wasted disk space.
> 
> Since InnoDB stores data in key order and our keys are esssentially random 64bit
> numbers as strings, are we going to have a significant overhead in our
> insertions?  
> 
> Using innostore, will there be any key/value padding on the data which
> would cause an overhead per row of disk usage?
> 
> Also, we currently compress the data on on disk, and I would interested in 
> hearing how the compression of disk pages with innostore works.
> 
> thanks,
> 
> -jeremy
> 
> -- 
> ========================================================================
> Jeremy Hinegardner                              jeremy at hinegardner.org 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list