bitcask and innostore overheads

Colin Surprenant colin.surprenant at gmail.com
Mon Oct 4 15:20:49 EDT 2010


Hi!

I just got this OMG moment after reading Sean's comment on Innostore's
buffer pool invalidation when using random patterns for the key space.
 I am at the point where the relative poor write performance of my
Riak setup has started to bite me. I am using Innostore and not using
Bitcask because of my huge & growing keys volume.

Now, this is not a rigorous benchmark but, in my staging environment,
which uses a single node "large" EC2 instance (4 EC2 compute units,
7.5GB ram) I was using MD5 style hash keys and my Riak insertion rate
was about 40-60 items per second (using 5 writer threads over the REST
api) and the load average on my system was getting very high, around
10.

After seing this comment, I changed my key format to use a simple
increasing integer number and, bingo, my insertion rate increased
approx 10 fold, with a negligible impact of the system load.

I think it would be worth point this out in the doc somewhere. This
very simple fact does have a *huge* impact on Innostore's performance.

Colin

On Tue, Jul 6, 2010 at 4:18 PM, Sean Cribbs <sean at basho.com> wrote:
> Jeremy,
>
> I'm glad to see you're still looking at Riak.
>
> Regarding your bitcask question, that does seem to be in the correct range of sizes.  Dave (@dizzyco) tells me the actual figure is 24 bytes + the hashtable overhead.
>
> Inno does pad things to fixed-size pages, so yes, you could end up with wasted disk.  However, I would suspect the greater concern would be excessive invalidation of the buffer pool from the essentially random/uniform shape of your key-space, making it difficult to get good throughput.  Inno works best when keys are inserted in sequential order.
>
> Sean Cribbs <sean at basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Jul 6, 2010, at 3:36 PM, Jeremy Hinegardner wrote:
>
>> Hi all,
>>
>> I am doing some sizing estimates for a possible transition to riak of
>> our document store.  I've mentioned it before on this list before and
>> in #riak and this is a snippet of a conversation I had with @seancribbs:
>>
>>    https://gist.github.com/c3838e5c421d6ab21c93
>>
>> I have also reviewed the bitcask-intro.pdf and http://gist.github.com/438065
>>
>> Quick and dirty info, I am looking to store billions of documents, starting
>> with 2 billion initially, and a linear growth of around 10 million per day.
>>
>> The key is a 64bit number as a string (generaly about 20 bytes) and the value is
>> a text/xml document of an average size of 1.5KiB.  This size long tails out to
>> maybe 5 MiB.
>>
>> Our system is write once. A key/value pair should never be overwritten once it
>> is initially inserted, and it is accessed fairly often for about a day, and then
>> a long tail drop off.  The pair must be available for retrieval at any time.
>>
>> == Bitcask ==
>>
>> I went into the source of bitcask to confirm the 32 bytes per key minimum
>> memory requirements mentioned in http://gist.github.com/438065 and turned
>> up:
>>
>>    http://github.com/basho/bitcask/blob/master/c_src/bitcask_nifs.c#L37
>>
>> If my calculations are correct, the actual memory overhead, per key using
>> bitcask is 72+N bytes on a 64bit system:
>>
>>    UT_hash_handle ->  50 bytes, (6 pointers and 2 chars)
>>    file_id        ->   4 bytes,
>>    total_sz       ->   4 bytes,
>>    offset         ->   8 bytes,
>>    tstamp         ->   4 bytes,
>>    key_sz         ->   2 bytes,
>>    key            ->   N bytes - how big is this?  is this the riak key,
>>                                  or a hash of the riak key?
>>
>> This adds up to 72 bytes + the size of the key, per key/value in bitcask.
>>
>> If I assume that the key is 20 bytes, then we are talking 92 bytes of memory
>> overhead per document. That means I can store, ~11 million documents per GiB of
>> free memory (1024^3 / 92),  Or, if I have 32GiB of free ram on a machine
>> to dedicate to riak w/bitcask (the rest would be used for diskcache) I
>> can store ~373 Million documents.
>>
>> Are my calculations correct?
>>
>> It also does not look like bitcask pads values on disk, so there is no wasted
>> disk space.  Is this correct?
>>
>> == Innostore ==
>>
>> For Innostore I'm not so worried about the memory overhead as insertion overhead
>> and wasted disk space.
>>
>> Since InnoDB stores data in key order and our keys are esssentially random 64bit
>> numbers as strings, are we going to have a significant overhead in our
>> insertions?
>>
>> Using innostore, will there be any key/value padding on the data which
>> would cause an overhead per row of disk usage?
>>
>> Also, we currently compress the data on on disk, and I would interested in
>> hearing how the compression of disk pages with innostore works.
>>
>> thanks,
>>
>> -jeremy
>>
>> --
>> ========================================================================
>> Jeremy Hinegardner                              jeremy at hinegardner.org
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list