Recently, I happened to have an onsite engagement and the goal of the engagement was to move a database service to RDS Aurora. Like probably most of you, I knew the service by name but I couldn’t say much about it, so, I Googled, I listened to talks and I read about it. Now that my onsite engagement is over, here’s my first impression of Aurora.
First, let’s describe the service itself. It is part of RDS and, at first glance, very similar to a regular RDS instance. In order to setup an Aurora instance, you go to the RDS console and you either launch a new instance choosing Aurora as type or you create a snapshot of a RDS 5.6 instance and migrate it to Aurora. While with a regular MySQL RDS instance you can create slaves, with Aurora you can add reader nodes to an existing cluster. An Aurora cluster minimally consists of a writer node but you can add up to 15 reader nodes (only one writer though). It is at the storage level that things become interesting. Aurora doesn’t rely on a filesystem type storage, at least not from a database standpoint, it has its own special storage service that is replicated locally and to two other AZ automatically for a total of 6 copies. Furthermore, you pay only for what you use and the storage grows/shrinks automatically in increments of 10 GB, which is pretty cool. You can have up to 64 TB in an Aurora cluster.
Now, all that is fine, but what are the benefits of using Aurora? I must say I barely used Aurora; one week is not a field proven experience. These are claims by Amazon, but, as we will discuss, there are some good arguments in favor of these claims.
The first claim is that the write capacity is increased by up to 4x. So, even if only a single instance is used as writer in Aurora, you get up to 400% the write capacity of a normal MySQL instance. That’s quite huge and amazing, but it basically means replication is asynchronous at the storage level, at least for the multi-AZ part since the latency would be a performance killer. Locally Aurora uses a quorum-based approach with the storage nodes. Given that the object store is a separate service with its own high availability configuration, that is a reasonable trade-off. For example, the clustering solutions with Galera like Percona XtraDB Cluster typically lowers the write capacity since all nodes must synchronize on commit. Other claims are that the readers performance is unaffected by the clustering and that the readers have almost no lag with the writer. Furthermore, as if that is not enough, readers can’t diverge from the master. Finally, since there’s no lag, any readers can replace the writer very quickly, so in terms of failover, all is right.
That seems almost too good to be true; how can it be possible? I happen to be interested in object stores, Ceph especially, and I was toying with the idea of using Ceph to store InnoDB pages. It appears that the Amazon team did a super great job at putting an object store under InnoDB and they went way further than what I was thinking. Here, I may be speculating a bit and I would be happy to be found wrong. The writer never writes dirty pages back to the store… it only writes fragments of InnoDB log to the object store as objects, one per transaction, and notifies the readers of the set of pages that have been updated by this fragment log object. Just have a look at the show global status of an Aurora instance and you’ll see what I mean… Said otherwise, it is like having an infinitely large set of InnoDB log files; you can’t reach the max checkpoint age. Also, if the object store supports atomic operations, there’s no need for the double-write buffer, a high source of contention in MySQL. Just those two aspects are enough, in my opinion, to explain the up to 4x performance claim for the write capacity, but also considering the amount of writes and the log files are a kind of binary diff, that’s usually much less stuff to write than whole pages.
Something is needed to remove the fragment log objects, since over time, the accumulation of these log objects and the need to apply them would impact performance, a phenomenon called log amplification. With Aurora, that seems to be handled at the storage level and the storage system is wise enough to know that a requested page is dirty and apply the log fragments before sending it back to the reader. The shared object store can also explain why the readers have almost no lag and why they can’t diverge. The only lag the readers can have is the notification time which has to be short if within the same AZ.
So, how does Aurora compares to a technology like Galera?
- Higher write capacity, writer is unaffected by the other nodes
- Simpler logic, no need for certification
- No need for an SST to provision a new node
- Can’t diverge
- Scale iops tremendously
- Fast failover
- No need for quorum (handled by the object store)
- Simple to deploy
- Likely asynchronous at the storage level
- Only one node is writable
- Not open source
Aurora is a mind shift in term of database and a jewel in the hands of Amazon. Openstack currently has no database service that can offer similar features. I wonder how hard it would be to produce an equivalent solution using well known opensource components like Ceph for the object store and corosync or zookeeper or zeroMQ or else for the communication layer. Also, would there be a use case?