riak sizing considerations

Sebastian Gerlach sebastian.gerlach at immonet.de
Tue Jan 24 05:21:12 EST 2012

Dear Riak-Users,

we consider to save a large amount (50000000) of binary Data (Images) in
a riak cluster. Each image has a size of 648 KB. We want to store 3
copy's of each image.

In this case i need to store  50000000 * 648 KB * 3 = 90.5 TB Data. This
calculation didn't include any overhead for reorganisation and other stuff.

On the other hand is the network. I run some benchmarks on a 4 node
cluster. Each with a 1 Gbps interface. In addition to the benchmarks
I've made some calculations.

Some information for the benchmark:
- I use the same interface for clustercommunication and benchmarking.
- I use the riak http api interface
- time curl -s
HTTP://interface:8098/buckets/test-01/keys/[10001-20000].jpg > /dev/null

In theory, a 1 Gbps interface provides 125 MB per second. In my
calculation i only use 50 percent of the theoretically available
bandwidth. This fit very well to my benchmarks.

I try a while with the '{"props":{"r":X}}'.

Calculation “r=2”
available bandwidth = 62.5 MB per second / (3*648 KB) = 33 requests per
second per node = 132 requests per second over the cluster.

Calculation “r=1”
available bandwidth = 62.5 MB per second / (2*648 KB) = 50 requests per
second per node = 200 requests per second over the cluster.

In this second case i see some strange effects in the network. My send
and received queues grow verry fast. And after finishing the benchmark
there is a while a lot of traffic between the riak nodes.

Does anyone have experience with these data sets and can give a few
hints at a possible setup? The goal is to processed at least 500
requests per second.

Some other points in my considerations are the time required for a
reorganization after a new node are added to the cluster or a node has
been replaced.

Many thanks for your reply and your attention.

Kind regards

More information about the riak-users mailing list