how does riak scale?

Sean Cribbs sean at
Mon Apr 12 14:57:12 EDT 2010

A number of points:

* The replication factor (N) * the total original data size = total data size stored in the cluster.  For example, if N=3, then you have 300GB of data stored across the cluster.  If you had 6 nodes, this would be about 50GB per node.
* Riak can be optimized in your setup for GET speed, but in general it is optimized for fault-tolerance and availability.  Even in the best of conditions, it will not perform as optimally as a web server reading static files off the local disk.
* The replication factor will affect both reads and writes.  If your cluster size is larger than your N, you should see increased throughput as you add nodes.

Sean Cribbs <sean at>
Developer Advocate
Basho Technologies, Inc.

On Apr 12, 2010, at 1:19 PM, TuX RaceR wrote:

> Hello Riak users!
> Let me first present on a simple example what I can see as an application that 'scales':
> If a have a web server (e.g apache) serving 100Gb of files and I get 4000 GET per seconds, one way to scale the application is to copy (rsync) the 100Gb to another web server and balance the load on the two nodes: that way, globally my cluster of 2 nodes can serve 2 x 4000 GET per seconds.
> I can say that my architecture scales as if I multiply by 2 the number of nodes, then I can multiply by 2 the number of requests that the system can handle (per second).
> With Riak, I am not sure to understand how the scaling works. Are we speaking about a global 'key GET' rate (request per second) that scales with the number of nodes added?
> My web server example above also assumed that all the data (100Gb) could fit into a single node. As I understand Riak could be used to serve data too large to fit on one disk. So maybe the scaling is about the data itself: a web client (browser) will not see a speed difference in the response from a riak cluster serving K keys with N nodes and another riak cluster serving 2*K keys on 2*N nodes.
> Also the number of replicas role in scaling is not clear to me: it seems to me that having a lot of replicas speeds up reads but slows down writes. Is there a simple scaling law for this?
> Thanks in advance,
> TuX
> _______________________________________________
> riak-users mailing list
> riak-users at

More information about the riak-users mailing list