Questions about Riak Enterprise

Mark Rose markrose at
Wed May 9 12:11:39 EDT 2012

On Wed, May 9, 2012 at 12:38 AM, Andrew Thompson <andrew at> wrote:
> > Does the approximately 1 ms of latency between av zones affect Riak's
> > performance that much?
> If the latency is *guranteed* to be that low, then you should be ok,
> although I'm not sure how the networking works across zones. If the
> latency can do crazy things in outage conditions, you'll stand a decent
> chance of screwing the cluster. A downed node is better than a really,
> really slow one.

Well, one thing about AWS is that nothing is guaranteed. I have seen
latency spike up to 10 ms between zones, but it's brief. The zones may or
may not be in the same building, but they are close together and share the
same 10.x.x.x space. Amazon currently charges 1¢/GB for transfer between
zones, so there's obviously some network constraints between them compared
to machines inside a zone.

> > We were planning to run across av zones for fault tolerance, just beefing
> > up single nodes for the moment until rack awareness is available. So the
> > recommended solution is to use EDS to accomplish this?
> I'm not sure what you're describing here.

Basically, we were are planning to run a single 3 node cluster, with 1 node
in each av zone. We use this technique with a 3 node Galera cluster
(synchronous MySQL replication). Galera handles a disappearing node very
well, so if an av zone starts acting up the remaining machines continue
working. We run all our instance types in multiple zones so we can handle
an av zone going down.

>From what you're describing, Riak/Erlang doesn't handle a flaky
node/network well, so some manual intervention would be needed in the case
a node/network starts acting funny.

Because Riak doesn't offer rack awareness (we could treat each av zone as a
rack), and we still want copies of our data in multiple zones, our only
option to ensure live data is replicated in all the zones (for high
availability) is to set the number of replicas equal to the number of
nodes. We'll be fine until we outgrow the largest EC2 instance type.

Is rack awareness a planned feature? If so, when (ballpark) is it planned

Actually, its worse than that because of some legacy behaviour. EDS
> wants to know the bind IP, not a hostname, and it will exchange node IPs
> with the other side of the connection, so internal IPs can 'leak' to the
> other cluster and cause connection problems. There is a workaround for
> this, and I do plan to address it.

I suppose the interim solution for EDS across EC2 regions is to use a VPC
in each region and use unique 10.x.x.x subnets in each and VPN between
them. But for us, we're not at the point of deploying to multiple regions
yet, so no need to dig more into this at the moment.

Thank you for answering my questions. This really helps!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list