In-Memory Performance

Ryan Zezeski rzezeski at
Tue Aug 2 23:31:31 EDT 2011


I've said it a couple of times on the ML recently but I think it's worth
saying again.  Riak is not a cache.  Riak's core competency is being a
_highly available_ data store.  The highly available part is primarily
accomplished via replicas, consistent hashing, and fallback/hinted handoff.
 Even when using an in-memory backend you must pay the toll of using
replicas.  If you really wanted to push Riak to it's limits as a cache then
perhaps a separate bucket with N=1 would be something to try, but you'll
still have overhead to pay in regards to coordination (yes, it's only 1
vnode but a coordinator will still be used) and possible vnode contention if
you have a key that is often accessed (all reads to the same key will be
performed in sequential order on the same vnode).

It is this fundamental decision made by Riak that almost guarantees you will
probably not touch the performance of membase without some contortions.  I'm
not saying you couldn't do it, and I'd be happy to be proved wrong :).  I
just think it's using Riak for something it was never designed for.  It
sounds like you need a cache, so why not use one?

All that said, I think building some type of cache into Riak for often
accessed items is a good idea that would potentially solve your problem
without requiring another component in your system.

BTW, I don't mean to discourage your experiment.  I think it's neat to see
what Riak's limits are in different cases.  I think it's cool you got Riak
to stay within a factor of 3.


On Tue, Aug 2, 2011 at 11:22 AM, Matt Savona <matt.savona at> wrote:

> Hi all,
> My colleagues and I are evaluating Riak as a persistent, replicated K-V
> store.
> I have a fairly simple (and not so scientific) test that reads and
> writes 5000 objects that are 32K in size. I am particularly interested
> in squeezing every last bit of performance out of Riak in a very
> read-heavy environment. I want to avoid hitting disk for reads as much
> as possible; our entire content set is much larger than could ever be
> stored in RAM, but preferably hot/active objects will remain resident
> in memory until various conditions may force them to be evicted. While
> the content set is quite large, the number of active keys represent a
> very small portion of the data which could easily fit in RAM.
> I've been running the same test against Riak given various
> combinations of backends and access protocols (HTTP vs. PB).
> My numbers can be seen in this screenshot:
> It is quite evident (and perhaps obvious) that Protocol Buffer
> performance is noticeably better than HTTP in most cases.
> What is confusing to me is the performance of purely in-memory
> backends. Notably, GB Trees and LRU Cache (and even Innostore), at
> best took 14s to retrieve 5000 32K objects. The exact same test
> against Membase took just 6s.
> Perhaps I'm not comparing apples to apples (Riak in-memory versus
> Membase). Do my tests look reasonable and do the numbers look roughly
> in-line with expectations? Is there any way to squeeze more juice out
> of Riak? A purely in-memory/non-persistent backend will not suffice
> for our ultimate needs, but for testing purposes I'm just trying to
> see if I can get read performance more in line with what we're seeing
> with Membase. We love everything about it, but we haven't yet hit the
> performance we were hoping for.
> Thanks in advance!
> - Matt
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list