Is Riak suitable for s small-record write-intensive billion-records application?

Jeremiah Peschka jeremiah.peschka at
Thu Oct 18 10:17:06 EDT 2012

TL;DR version: Riak works, storage overhead will be more than you think.

Good point, Alex, it was easy to forget about overhead.

I did some quick digging and the best info I can find is a delightful
exchange between Alex and I around April of last year [1] and this awesome
email from Nico Meyer back in May of 2011[2]. Another option is to use the
magical Bitcask capacity sizing page in the documentation[3]. Yes, it's for
Bitcask. No, I couldn't find something for LevelDB.

Based on Alex's recollection and Nico's math, you can expect between ~120
and ~380 bytes of overhead per object when you're using bitcask. LevelDB
should be similar, but compression may help out here if you're storing
repetitive values or if integer compression kicks in.

However, if you have any likelihood of updates in your application, you can
see additional growth in the object headers - vector clocks may grow and
there's always a possibility of siblings if writes come in fast (object
overhead goes up again with siblings being stored alongside the object).


Jeremiah Peschka
Managing Director, Brent Ozar PLF, LLC

On Thu, Oct 18, 2012 at 6:53 AM, Alexander Sicular <siculars at>wrote:

> I've read the other replies and I'm gonna be the downer at this party.
> Imho, Riak is not well suited to small value applications due to on disk
> overhead. Last time this came up on the list, I recall there being a ~450
> byte overhead per key. If that still holds, and I believe it does, you need
> to factor it into your disk space calculations.
> Otherwise, your criteria are well within the capabilities of Riak. Use the
> leveldb backend and in your client code run a get before a put and you'll
> be fine. Of course, if you're writing a single key in rapid succession you
> need to consider the eventual consistency nature of Riak and factor that
> into your design.
> Keep us posted,
> Alexander
> @siculars
> Sent from my iRotaryPhone
> On Oct 18, 2012, at 7:42, Yassen Damyanov <yassen.tis at> wrote:
> > Hi everyone,
> >
> > Absolutely new (and ignorant) to NoSQL solutions and to Riak (my
> > apologies; but extensive experience with SQL RDBMS).
> >
> > We consider a NoSQL DB deployment for a mission-critical application
> > where we need to store several hundreds of MILLIONS of data records,
> > each record consisting of about 6 string fields, record total length
> > is 160 bytes. There is a unique key in each record that seems suitable
> > for hashing (20+ bytes string, e.g. "cle01_tpls01_2105328884").
> >
> > The application should be able to write several hundreds of new
> > records per second, but first check if the unique key already exists.
> > Writing is to be done only if it is not there. If it is, the app needs
> > to retrieve the whole record and return it to the client and no
> > writing is done in this case.
> >
> > I need to know if Riak would be suitable for such application. Please,
> > advice, thanks!
> >
> > (Again, apologies for my ignorance. If we choose Riak, I promise to
> > get educated ;)
> >
> > Yassen
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at
> >
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list