Bitcask vs innostore, again

Dmitry Demeshchuk demeshchuk at gmail.com
Thu Apr 7 00:30:34 EDT 2011


Sorry, guys, I know this subject has been discussed thousand times.

So, briefly:

Innostore:

1. Time-proven technology
2. Very tunable.
3. Have been being used in production by us (Mochi Media) and other
companies for a long time.
4. Some tasty hidden features that can be implemented in Riak
(first/last key for a bucket, fast removal of a bucket, etc)
But:
5. Potentially slower than bitcask in many cases (but still at least comparable)
6. Not considered as the mainstream storage, so in fact no longer
being improved.
7. One bucket equals to one InnoDB table, which means a separate file.
Million of buckets == million of files.
8. Unlike bitcask, may require time to be repaired upon node failure.

Bitcask:

1. Fast, sometimes ridiculously fast.
2. Doesn't generate thousands of files.
3. Now being considered as the main Riak storage.
4. Has some nice features, for instance, LRU-like mechanism based on
removing old values upon merging. Most likely, will have more of them
in the future.
But:
5. Requires all the keys to fit in memory.
6. Can make your database grow fast if you make frequent value updates
(however, merges tuning helps, more or less).
7. Still immature compared to innostore.
8. There were production complaints about it some time ago.

To clarify the last point, I've been having myself some problems with
bitcask previously (running out of file descriptors, bad merges) and
heard that some people periodically try to migrate from innostore to
bitcask, and stick to innostore, keeping disappointing in bitcask. I
mean no offense to Basho team here, a lot of problems have been
successfully fixed during bitcask's lifetime. And no one can create a
perfect product in no time. Still, bitcask is very impressive, having
just a 1-year history. Dave Smith and guys have done a lot so far.

What I haven't heard about bitcask yet is any production success
stories. Which storage does Wikia use, for example? Or Vibrant Media?

So, I look for stories of at least 2-3 months experience of using
bitcask, with 10-20GB total data or larger. What problems have you
faced? Have you managed to solve them? What advantages have you got
using bitcask compared to innostore? Any details of the data sets you
use(updates/deletes/puts frequency, keys/buckets number, etc)? Do you
use any other backends along with bitcask?

Thank you.

-- 
Best regards,
Dmitry Demeshchuk




More information about the riak-users mailing list