EC2 and RIAK

Eric Moritz eric at themoritzfamily.com
Fri Apr 1 13:03:06 EDT 2011


I concur with Mark, Rackspace has some poor performance with Riak.

On Fri, Apr 1, 2011 at 1:00 PM, Mark Steele <msteele at beringmedia.com> wrote:
> I've done some rather disappointing tests with Riak using Rackspace cloud
> servers. Much better off on dedicated hardware if you can find it.
> Mark Steele
> Bering Media Inc.
>
>
> On Fri, Apr 1, 2011 at 11:51 AM, David Dawson <david.dawson at gmail.com>
> wrote:
>>
>> Mathias and Alexander,
>>
>>        Thanks for both of your replies they were very informative and
>> really have helped me to make my mind up, but to summarise:
>>
>>        - If you want good predictable performance but you are happy to
>> live with the risk of loosing some of your data ( in the event of a cluster
>> failure where the number of nodes that fail > than your n_val ) then run
>> with the local ephemeral storage in RAID 5  or 10 and take snapshots of the
>> data periodically or run in dual DC mode with replication.
>>        - If you want 100%  assurance that your data is available and you
>> are happy with unpredictable performance then use EBS.
>>        - If you want 100% assurance that your data is available and also
>> has predictable performance then Amazon EC2 is not the most optimal choice.
>>
>>        In our scenario we are doing a equal amount of reads and writes,
>> and will need to guarantee  about 32K ops/sec from a RIAK cluster over a 2
>> hour period with minimal risks of a outage or drop in performance, hence I
>> am guessing that maybe EC2 is not the right choice for us. We are going to
>> look at Joyent as an alternative, that said has anyone else used other
>> solutions e.g. RackSpace cloud?
>>
>> Dave
>>
>>
>> On 1 Apr 2011, at 14:21, Mathias Meyer wrote:
>>
>> > Hi David,
>> >
>> > Alexander already gave you a good rundown on EC2 and Riak, but let me
>> > add some of my own experiences running databases on EC2 in general.
>> >
>> > The short answer is, Riak is certainly successfully used in production
>> > on EC2, so nothing should hold you back from testing a setup on EC2. But
>> > there's a whole bunch of things you should keep in mind.
>> >
>> > First, it's probably a good idea to avoid using ephemeral storage as
>> > persistent storage. Even though it rarely happens, instances can crash on
>> > EC2 for any kind of reason, mostly a hardware failure of the underlying host
>> > of course.
>> >
>> > Cluster compute instances offer especially high CPU power, but what you
>> > really want is really fast and reliable storage I/O, persisted for eternity
>> > if need be. CC instances are certainly a lot better than any other instance
>> > in that terms of general I/O (see [2] for a comparison), but fall prey to
>> > similar limitations in terms of network storage I/O as other instance types,
>> > see below.
>> >
>> > The RAID 0'd ephemeral storage on the cluster compute instances may
>> > sound good in theory in terms of performance, but in practice it takes away
>> > data durability in case of a single disk failure. One disk fails, and the
>> > data on that node is gone. Depending on what kinds of seek your doing, an
>> > EBS setup may even turn out to be faster. See [6] and [4] for a comparison
>> > and some initial and extended measurements, and [7] for another comparison.
>> > But certainly, the cluster compute instance's ephemeral storage can achieve
>> > a good amount of throughput, see [5] for some pretty graphs comparing both
>> > RAID and non-RAID setups.
>> >
>> > As Alexander pointed out, multiple instance failures can make this
>> > scenario a real killer, though you end up with the same risks as running on
>> > raw iron servers. Both ephemeral storage and EBS don't make the problem of a
>> > proper backup disappear. You could e.g. run off ephemeral storage, relying
>> > on both Riak's replication and a good backup e.g. to an EBS volume or to S3.
>> >
>> > EBS on the other hand is prone to a large variance in network latency,
>> > making performance at any point unpredictable and unreliable. Every
>> > measurement you take is likely to be different an hour later. This may sound
>> > extreme, but it turns out to be a very big issue for databases where there's
>> > lots of disk I/O involved to read and write data, as is the case with Riak's
>> > Bitcask storage.
>> >
>> > You can increase the performance and reliability of EBS by using a RAID
>> > of volumes. Preferrably go for a RAID 5 or RAID 10 to add redundancy.
>> > There's mixed opinions on whether that's really necessary on EBS, with
>> > Amazon keeping the data redundant on their end as well, but in general, it's
>> > a good tradeoff between increased performance through striping and increased
>> > redundancy through mirroring. [1] has a good summary of when it's better to
>> > choose RAID 5 vs. 10.
>> >
>> > RAID 0 will obviously bring the best performance, it's certainly a valid
>> > setup. We've been running RAID 0 setups with 4 volumes, and got great
>> > improvements over a single volume. You're also likely to achieve more
>> > throughput on bigger instances with a setup like this. The caveat once again
>> > is that one corrupted volume is enough to make a RAID 0 setup unusable.
>> >
>> > Another crazy thought is to setup a RAID striping across a bunch of
>> > ephemeral drives and EBS volumes, maximizing throughput on both local and
>> > network storage. But know what you're getting yourself into with this kind
>> > of setup, especially when your write load is a lot heavier then the
>> > available network bandwidth can handle, a scenario where your network
>> > volumes will never be able to catch up with the local storage.
>> >
>> > All that said, EBS I/O sure is reasonably fast, but it depends on your
>> > particular use case and performance requirements. It's also worth noting
>> > that the I/O capabilities of EBS increase with the instance size. The bigger
>> > your instance, the more throughput you'll achieve (see [3]). Bigger
>> > instances tend to have better network throughput in general, with cluster
>> > compute instances obviously having some of the highest bandwidth available.
>> >
>> > All this turns out to be much less of a problem when data can be held in
>> > memory very easily, e.g. with Innostore, where you can read and write
>> > to/from cache buffers first and then have InnoDB take care of flushing to
>> > disk.
>> >
>> > Personally, I don't think you're overcomplicating things in regard to
>> > multiple availability zones, it's a good idea to do that, when highest
>> > availability possible is your goal, as when it's usually just a single
>> > availability zone that's affected by increased latency or network timeouts,
>> > but as Alexander said, you should think about having cross-datacenter
>> > replication in that scenario, as availability zones are data centers located
>> > in different physical locations. Usually they're not that far apart, but far
>> > enough to increase latency considerably. But as always, it depends on your
>> > particular use case.
>> >
>> > Now, after all this realtalk, here's the kicker. Riak's way of
>> > replicating data can make both scenarios work. When it's ensured that your
>> > data is replicated on more than one node, it can work in both ways. You
>> > could use both ephemeral storage and be somewhat safe because data will
>> > reside on multiple nodes. The same is true for EBS volumes, as potential
>> > variances in I/O or even minutes of total unavailabilities (as seen on the
>> > recent Reddit outage) can be recovered a lot easier thanks to handoff and
>> > read repairs. You can increase the number of replicas (n_val) to increase
>> > your tolerance of instance failure, just make sure that n_val is less than
>> > the number of nodes in your cluster.
>> >
>> > Don't get me wrong, I love EC2 and EBS, being able to spin up servers at
>> > any time and to attache more storage to a running instance is extremely
>> > powerful, when you can handle the downsides. But if very low latency is what
>> > you're looking for, raw iron with lots of memory and SSD as storage device
>> > thrown on top is hard to beat.
>> >
>> > When in doubt, start with a RAID 0 setup on EBS with 4 volumes, and
>> > compare it with a RAID 5 in terms of performance. They're known to give a
>> > good enough performance in a lot of cases. If you decide to go with a RAID,
>> > be sure to add LVM on top for simpler snapshotting, which will be quite
>> > painful if not impossible to get consistent snapshots using just EBS
>> > snapshots on a bunch of striped volumes.
>> >
>> > Let us know if you have more questions, there's lots of details involved
>> > when you're going under the hood, but this should cover the most important
>> > bases.
>> >
>> > Mathias Meyer
>> > Developer Advocate, Basho Technologies
>> >
>> > [1]
>> > http://en.wikipedia.org/wiki/RAID#RAID_10_versus_RAID_5_in_Relational_Databases
>> > [2]
>> > http://blog.cloudharmony.com/2010/09/benchmarking-of-ec2s-new-cluster.html
>> > [3]
>> > http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-in-cloud.html
>> > [4]
>> > http://blog.bioteam.net/2010/07/boot-ephemeral-ebs-storage-performance-on-amazon-cc1-4xlarge-instance-types/
>> > [5]
>> > http://blog.bioteam.net/2010/07/local-storage-performance-of-aws-cluster-compute-instances/
>> > [6]
>> > http://blog.bioteam.net/2010/07/preliminary-ebs-performance-tests-on-amazon-compute-cluster-cc1-4xlarge-instance-types/
>> > [7] http://victortrac.com/EC2_Ephemeral_Disks_vs_EBS_Volumes
>> >
>> > On Mittwoch, 30. März 2011 at 18:29, David Dawson wrote:
>> >> I am not sure if this has already been discussed, but I am looking at
>> >> the feasibility of running RIAK in a EC2 cloud, as we have a requirement
>> >> that may require us to scale up and down quite considerably on a month by
>> >> month basis. After some initial testing and investigation we have come to
>> >> the conclusion that there are 2 solutions although both have their downsides
>> >> in my opinion:
>> >>
>> >> 1. Run multiple cluster compute( cc1.4xlarge ) instances ( 23 GB RAM,
>> >> 10 Gigabit ethernet, 2 x 845 GB disks running RAID 0 )
>> >> 2. Same as above but using EBS as the storage instead of the local
>> >> disks.
>> >>
>> >> The problems I see are as follows with solution 1:
>> >>
>> >> - A instance failure results in complete loss of data on that machine,
>> >> as the disks are ephemeral storage ( e.g. they only exist whilst the machine
>> >> is up ).
>> >>
>> >> The problems I see are as follows with solution 2:
>> >>
>> >> - EBS is slower than the local disks and from what I have read is
>> >> susceptible to latency depending on factors out of your control.
>> >> - There has been a bit of press lately about availability problems with
>> >> EBS, so we would have to use multiple availability zones although there are
>> >> only 4 in total and it just seems as though I am over complicating things.
>> >>
>> >> Has anyone used EC2 and RIAK in production and if so what are their
>> >> experiences?
>> >>
>> >> Otherwise has anyone used RackSpace or Joyent? as these are
>> >> alternatives although the Joyent solution seems very expensive, and what are
>> >> their experiences?
>> >>
>> >> Dave
>> >> _______________________________________________
>> >> riak-users mailing list
>> >> riak-users at lists.basho.com
>> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>




More information about the riak-users mailing list