EmergencyEMERGENCY? Get 24/7 Help Now!

A first look at RDS Aurora

 | November 2, 2015 |  Posted In: InnoDB, MySQL, OpenStack, Percona XtraDB Cluster

PREVIOUS POST
NEXT POST

Recently, I happened to have an onsite engagement and the goal of the engagement was to move a database service to RDS Aurora. Like probably most of you, I knew the service by name but I couldn’t say much about it, so, I Googled, I listened to talks and I read about it. Now that my onsite engagement is over, here’s my first impression of Aurora.

First, let’s describe the service itself. It is part of RDS and, at first glance, very similar to a regular RDS instance. In order to setup an Aurora instance, you go to the RDS console and you either launch a new instance choosing Aurora as type or you create a snapshot of a RDS 5.6 instance and migrate it to Aurora. While with a regular MySQL RDS instance you can create slaves, with Aurora you can add reader nodes to an existing cluster. An Aurora cluster minimally consists of a writer node but you can add up to 15 reader nodes (only one writer though). It is at the storage level that things become interesting. Aurora doesn’t rely on a filesystem type storage, at least not from a database standpoint, it has its own special storage service that is replicated locally and to two other AZ automatically for a total of 6 copies. Furthermore, you pay only for what you use and the storage grows/shrinks automatically in increments of 10 GB, which is pretty cool. You can have up to 64 TB in an Aurora cluster.

Now, all that is fine, but what are the benefits of using Aurora? I must say I barely used Aurora; one week is not a field proven experience. These are claims by Amazon, but, as we will discuss, there are some good arguments in favor of these claims.

The first claim is that the write capacity is increased by up to 4x. So, even if only a single instance is used as writer in Aurora, you get up to 400% the write capacity of a normal MySQL instance. That’s quite huge and amazing, but it basically means replication is asynchronous at the storage level, at least for the multi-AZ part since the latency would be a performance killer. Locally Aurora uses a quorum-based approach with the storage nodes. Given that the object store is a separate service with its own high availability configuration, that is a reasonable trade-off. For example, the clustering solutions with Galera like Percona XtraDB Cluster typically lowers the write capacity since all nodes must synchronize on commit. Other claims are that the readers performance is unaffected by the clustering and that the readers have almost no lag with the writer. Furthermore, as if that is not enough, readers can’t diverge from the master. Finally, since there’s no lag, any readers can replace the writer very quickly, so in terms of failover, all is right.

That seems almost too good to be true; how can it be possible? I happen to be interested in object stores, Ceph especially, and I was toying with the idea of using Ceph to store InnoDB pages. It appears that the Amazon team did a super great job at putting an object store under InnoDB and they went way further than what I was thinking. Here, I may be speculating a bit and I would be happy to be found wrong. The writer never writes dirty pages back to the store… it only writes fragments of InnoDB log to the object store as objects, one per transaction, and notifies the readers of the set of pages that have been updated by this fragment log object. Just have a look at the show global status of an Aurora instance and you’ll see what I mean… Said otherwise, it is like having an infinitely large set of InnoDB log files; you can’t reach the max checkpoint age. Also, if the object store supports atomic operations, there’s no need for the double-write buffer, a high source of contention in MySQL. Just those two aspects are enough, in my opinion, to explain the up to 4x performance claim for the write capacity, but also considering the amount of writes and the log files are a kind of binary diff, that’s usually much less stuff to write than whole pages.

Something is needed to remove the fragment log objects, since over time, the accumulation of these log objects and the need to apply them would impact performance, a phenomenon called log amplification. With Aurora, that seems to be handled at the storage level and the storage system is wise enough to know that a requested page is dirty and apply the log fragments before sending it back to the reader. The shared object store can also explain why the readers have almost no lag and why they can’t diverge. The only lag the readers can have is the notification time which has to be short if within the same AZ.

So, how does Aurora compares to a technology like Galera?

Pros:

  • Higher write capacity, writer is unaffected by the other nodes
  • Simpler logic, no need for certification
  • No need for an SST to provision a new node
  • Can’t diverge
  • Scale iops tremendously
  • Fast failover
  • No need for quorum (handled by the object store)
  • Simple to deploy

Cons:

  • Likely asynchronous at the storage level
  • Only one node is writable
  • Not open source

Aurora is a mind shift in term of database and a jewel in the hands of Amazon. Openstack currently has no database service that can offer similar features. I wonder how hard it would be to produce an equivalent solution using well known opensource components like Ceph for the object store and corosync or zookeeper or zeroMQ or else for the communication layer. Also, would there be a use case?

PREVIOUS POST
NEXT POST
Yves Trudeau

Yves is a Principal Consultant at Percona, specializing in distributed technologies such as MySQL Cluster, Pacemaker and XtraDB cluster. He was previously a senior consultant for MySQL and Sun Microsystems. He holds a Ph.D. in Experimental Physics.

9 Comments

  • Hi Yves, I have done extensive tests on Aurora and Ec2/Galera an aside any discussion on how it work I had quite divergent results from what is claim to be by Amazon.

    My results are available at : http://www.tusacentral.net/joomla/index.php/mysql-blogs/175-aws-aurora-benchmarking-blast-or-splash.html
    It would be great if you can do some other tests, and load the platform, to see if you will get divergent results.
    And if so we may discuss why.

  • That’s very interesting. There was a deep-drive into Aurora during re:Invent 2015 available on Youtube now https://youtu.be/CwWFrZGMDds?list=PLK1o1yHy1fLgZkC4Np9AHsW5oxFi-p9OB
    I’ll look forward to more details on Aurora.

  • I think the title should have been “first look at AWS Aurora marketing” as this is just a summary of it. I look forward to a description of your experience with more details like those provided Marco

  • We could not get Aurora to sustain a write throughput of more than about 4000 rows / s for more than a couple of hours. Had to abandon it… It peaked at 50k/s I think, but always fell back to about 4k/s.
    If you’re considering Aurora, I recommend you run your benchmarks for a while before making a decision.

  • Salut Yves, thanks for sharing your analysis. What I understand is that you present Aurora as a single-writer/multiple-reader system based on shared storage with consistency based on cache invalidation. There is one cons in that type of system that you do not list: single storage means that data corruption at the storage layer kills the whole system. I am sure there are ways to lower the risk of such corruption, but still, shared storage is a big risk IMHO. Logical backups are also probably very hard to do, even if I have a hint of an idea how to do them with storage snapshots.

  • @Marco, I didn’t benchmark and in fact, I would be surprised that such a young product outperforms a well established product. My point is that the architecture is much cleaner than galera and has a lot of potential. The product is unfortunately close source and it will not be possible to improve Aurora like other opensource project, that’s why I wonder if an opensource equivalent should be developed.

  • @Jean-Francçois: Indeed you are absolutely right but the same issue affects, to some extent, Galera since corrupted pages could be spread by SST. The shared storage is, of course, replicated and can do checksums and healing automatically, which lowers a lot the risks. Replication is also not 100% safe, the existence of tools like pt-table-checksum proves that point.

  • @Marco, @Assaf- we’ve been seeing similar results in our tests as well. The latency figures climb as concurrency goes up, and keep climbing over time.

    Also it seems the benchmarks AWS used (sysbench 100% writes and sysbench 100% reads) don’t hit the contention/latency issue of shared storage, especially with a very small workload (250 tables of 25,000 rows, ie 6.25M rows)

Leave a Reply