Questions about Riak Enterprise

Andrew Thompson andrew at hijacked.us
Mon May 7 21:25:56 EDT 2012


On Tue, May 01, 2012 at 03:49:02PM -0400, Mark Rose wrote:
> I've got some questions about Riak Enterprise I haven't been able to find
> the answers to.

Hi Mark, I'm the riak EDS 'maintainer'. Sorry I didn't reply earlier, I
was travelling all week.

> I understand that the open source version of Riak's replication is designed
> for single data center usage only, but I'm unsure about how Riak Entreprise
> handles replication. Specifically, I'm curious about locality and high
> availability.
> 
> Our setup is already running in multiple availability zones on EC2. We're
> running Galera across the zones to provide both redundancy and a local copy
> of the data to avoid the network latency of going to another zone. However,
> Galera, as nice as it is, doesn't scale writes. We're going to be using
> Riak to store a lot of information going forward, and may eventually move
> our existing data to it as well.
> 
> The only thing holding us back from going to multiple regions on Amazon is
> our datastore.
> 
> How well does Riak handle layered topologies, such as EC2?
> 
> Is it possible to configure Riak Enterprise to store two copies of the data
> in each EC2 region, ensuring that the two copies are in different zones
> when there are more than one Riak servers in a zone?

Current EDS replication is pretty simple, it will just try to
(eventually) ensure that data on one cluster is mirrored on another. It
won't forward reads and riak doesn't have anything like 'rack
awareness', at least not yet.

> When a query is run, is it run in one region only? Would Riak prefer copies
> of the data in the local zone?

Riak only queries the local cluster, yes.
> 
> For what it's worth, our current datastore load is roughly half and half
> writes and reads. We heavily cache reads with memcache (99%). We may drop
> memcache if reads on Riak prove fast enough (thus avoiding the issues of
> invalidating remote caches).

Given the current limitations, you'd probably be best off with N
clusters in different regions and/or zones. Don't try to span a single
cluster across a zone, or even worse, a region. Then hook them together
with replication.

There's also some fun with NAT on EC2, but it can be made to work.

Let me know if that helps,

Andrew




More information about the riak-users mailing list