A while ago I started a series of posts showing benchmark results on Amazon EC2 servers with RAID’ed EBS volumes and MySQL, versus RDS machines. For reasons that won’t add anything to this discussion, I got sidetracked, and then time passed, and I no longer think it’s a good idea to publish those blog posts in the format I was planning. Instead, I want to write an overview of these two approaches to hosting MySQL in the Amazon cloud.
In general, MySQL performance overall on EC2 and EBS isn’t always great in comparison to what you can get on physical hardware, even low-to-medium sized servers. It’s not that it’s terrible (in most cases), but it’s not always great. There are specific use cases in which it’s perfectly acceptable and even good, but the range of cases isn’t as broad as what you can push your own servers to deliver.
Here’s why: you’re limited in the number and speed of CPU cores you can get, and I/O performance can be highly variable. You can mask I/O problems on some workloads by making I/O not matter, but that doesn’t always work, and it relies on memory, which only goes so big in the Amazon cloud. And you’re on a platform where some resources are shared, and that makes the performance of those resources tend to vary a lot.
None of these characteristics is a bad thing in itself. It’s just that MySQL can’t tolerate these weaknesses very well in some cases. I’m not trying to say you shouldn’t use the Amazon cloud to host your databases. You just need to know how the circumstances differ from physical hardware, and whether that is important.
Here are some observations you can consider. They don’t cover all cases, but they are things to consider.
These observations lead to the following guidelines.
As a result, if you’re going to build a big database in the Amazon cloud platform, bigger than a single instance can hold or with more write activity, you need to plan to shard from the beginning. That’s just what you have to do, at least at this point in time. (Nothing remarkable about this — the same is true for databases that are bigger than a single physical server.) Tangent: at this point I expect a certain cloud database provider beginning with an X to insert a plug into this post’s comments. I haven’t evaluated their technology, so I can’t comment on it. I’m sure their funders would like us to evaluate them on a paid basis and report to our readers on the result. We do that for many companies OK, end of tangent.
However, if you are careful with your physical and logical design, you can make certain workloads, such as insert-heavy, work better in the Amazon cloud platform. But this is a delicate balance. It requires tricks, such as partitioning so all the inserts go into one partition, whose indexes fit in memory. The more elaborate you get with this — such as putting your transaction log files onto the local disks, for example — the more operational complexity and cost you have, so that’s something to think about.
The physical and logical database design influences greatly how much memory and disk resources are required. The application’s access patterns can be just as powerful a lever. Thus, careful design can be extremely beneficial in getting a lot more from your database server.
On a business level, consider the benefits and drawbacks of RDS versus building the equivalent system yourself. RDS is nice in that it’s managed for you. You don’t have to do a lot of system administration work with RDS; you outsource that to Amazon, and you just do the database administration work. This can be a big relief, and it’s not a bad value for the money compared to building servers with EC2 and EBS. However, sometimes you might like more control over it yourself, such as the ability to customize your server version, or to manipulate the database files directly. The cost, of course, is that the sysadmin work is now your job.
Finally, there are a number of advantages to working in the AWS cloud. Others have pointed these out much more thoroughly than the drawbacks, in my opinion. But I need to at least mention the existence of key advantages at the technical level. These include EBS volume snapshots, for example. They work much better than LVM snapshots, in terms of impact to the system’s performance and ability to mount them on other machines. This is really nice for making replicas and backups, for example. I could name a bunch of other nice properties, but I think that’s not directly on-topic for this post.
The bottom line is that there is not a huge performance difference between EC2+EBS+MySQL versus RDS, in most cases, unless you use Percona Server; but it’s still not orders-of-magnitude different. So my experience is that you can decide between build-your-own and database-as-a-service based on your business needs, considering factors such as the availability of staff to manage the machines. On the technical side, don’t expect either architecture to knock your socks off with its performance, but if you can fit your working set of data into the buffer pool (with careful physical, logical, and application design) and you’re not so write-heavy that you’re doing a lot of I/O, performance can be quite acceptable or even very good.
Do you have experience running MySQL in the Amazon cloud to share with other readers? I welcome your comments, as always.