Choosing a MySQL HA Solution – Post-Webinar Q&A

June 7, 2013

Author

Ernie Souhrada

Insight for DBAs

Insight for Developers

MySQL

Share this Post:

Thanks to everyone who was in attendance on 05 June 2013 for my “Choosing a MySQL HA Solution” webinar. If you weren’t able to make it but are interested in listening to the presentation, it’s currently up and available for viewing over at percona.com.

My apologies if we weren’t able to get to your question during the initial session, so I’ll address those lingering questions in this post, along with providing a bit more detail on some of the questions that I did cover during the session.

Q: What is the reason that I recommended DRBD be used only on physical hardware and not on virtual machines?
A: I covered this a bit during the session, but to provide a bit more commentary. There are really two main reasons that I don’t like DRBD with virtual machines. First is the disk IO performance hit that comes with DRBD. When you run a virtual machine, in most cases your virtual disk is basically nothing more than a file on the host filesystem rather than an actual physical volume. So, in effect, you’re adding performance hit to performance hit. The other reason is because virtual machines tend to handle process scheduling and timing much differently than traditional hardware, and I’ve seen situations where even under just a moderate load on the host system, DRBD can end up in a split-brain situation. Typically DRBD is run in combination with heartbeat or pacemaker or something similar, and if one of the servers in the DRBD pair is having issues receiving packets from the other side, things start to degrade.

If you’re just setting up a test environment to get familiar with the technology, certainly there’s nothing wrong with a virtual-machine approach, but for the optimal production deployment you want two identical physical boxes with at least 4 NIC ports. Why 4? Two of those NICs should be directly connected from one machine to another and configured with Linux network interface bonding in balance-rr mode; this is the only NIC bonding mode which will allow you to stripe a TCP connection over multiple ports, and with a two-NIC bonded pair you’ll get roughly 1.67x the throughput of a single port. [Don’t try more than 2; the additional work that the kernel has to do in reordering TCP packets can actually result in performance that’s worse than a single NIC.] The other two NIC ports should be bonded with active-backup or LACP (depending on your switching infrastructure). In essence, you’re looking for two of everything.

Q: Is it possible to replicate only some tables from master to slave, and if so, how?
A: Yes. There are a few of ways to do it. One way is to set up replication filters on the slave, as described in the MySQL manual. You can configure the slave to only replicate a specific database, a set of databases, or specific tables; you can also configure the slave to replicate everything EXCEPT a specified set of databases and tables. With this method, every event is still written to the binary log on the master, but the slave determines what to do with it. The other approach is to filter what gets written to the binary log on the master. This can be useful if you’re trying to limit the amount of data sent over the wire, but it comes at the cost of having binary logs which are no longer complete or useful for point-in-time recovery. Generally if you’re considering the binary log filtering approach, I think it’s safer to set up a relay master (it can be on the same machine if you use the BLACKHOLE storage engine) in between the actual master and the slaves at the bottom of the replication topology. On the top-level master, you leave all filters disabled; on the relay master you add binary log filters; this ensures that the top-level master and its binary logs are fully intact, and then the binary log filters are executed on the intermediate server, thus resulting in less binary log data being sent down to the lower-level slaves. You can also use the slave_compressed_protocol option in /etc/my.cnf if bandwidth is a concern.

If you’re thinking about employing a filtering solution of any sort and it’s something you’re not that familiar with, I’d suggest reading over the manual’s description of how replication filtering rules are processed. There are some informative flowcharts in addition to the textual explanation.

Q: What is my recommendation for MySQL 5.6? Is MHA ready for MySQL 5.6?
A: My personal recommendation for MySQL 5.6, based on what I have seen so far (admittedly, not that much – I have personally only worked with one customer that’s using 5.6), is that I question its readiness. There have been some unfortunate performance regressions compared to 5.5, such as bug #69258, and bug #69318, and I think it might take one or two more point releases before we can consider it truly production-ready. I suppose I should state an obvious disclaimer here in that the aforementioned statements reflect only my opinion and not any official Percona position, and I will look forward to being able to retract them.

MHA should work fine with 5.6 if you’re not using GTID-based replication. If you are using GTID-based replication, it looks like you’ll need to wait for the next release of MHA.

Q: How do you monitor MySQL to determine when it’s not responsive?
A: The easiest way to do this is to simply connect to it and attempt to run a simple query. I’d suggest doing this as a user that does NOT have the SUPER privilege; a user with SUPER will always be able to connect, and that may not accurately report what your application servers are seeing. But, if you can connect and run a simple query and receive a response back in a reasonable (what is reasonable is determined by your environment), it’s a safe bet that MySQL is up and running.

That said, there are plenty of other MySQL and OS-level status variables that you might want to keep an eye on to prevent a problem before it occurs. For example, if you see a large number of long-running queries in the output of SHOW PROCESSLIST that might be a sign that trouble is brewing. Or if you watch your MySQL server’s memory usage and you see that it’s starting to swap, that might indicate that something is up. FWIW, solutions like MHA and PRM simply attempt to connect and run a simple query, but it’s always possible to build something more involved if that’s what you need.

Q: What MySQL HA solution is most similar to Oracle RAC?
A: Oracle RAC is a “shared everything” system; there really isn’t anything quite like that in the MySQL world, except maybe MyISAM files on top of a clustered filesystem with an external lock manager. You might say that MySQL/NDB Cluster is also somewhat similar to Oracle RAC; all of the SQL nodes in a MySQL Cluster installation are going to be talking to the same set of data nodes on the back end, but I think that’s probably where the similarities end.

Q: What kind of replication issues can you get with storage solutions like GlusterFS?
A: I have to admit that this was probably the most interesting question asked; my initial reaction was, honestly, why would you ever want to use MySQL on top of something like that? I can’t imagine the performance being all that fantastic, and I could see network latency wreaking havoc with MySQL’s internal understanding of what’s happening on the system below. Having never experimented with GlusterFS, though, I decided to give it a shot. I set up a 3 node cluster in AWS, got the volume mounted, and went to try to install MySQL. It failed miserably on the mysql_install_db process.

Every time I tried, I got some variant of this message:
130606 2:22:41 [ERROR] /usr/libexec/mysqld: Incorrect information in file: ‘./mysql/servers.frm’
ERROR: 1033 Incorrect information in file: ‘./mysql/servers.frm’

Running mysql_install_db under strace, I saw that it was getting a lot of “bad file descriptor” errors. I was able to get the server up with –skip-grant-tables and insert a few rows into an InnoDB table, but trying to run a simple mysqlslap just hung on me, so that’s where I left it. Maybe I’ll try messing around with this again at some point in the future, but I’m not optimistic that this is a viable use case. If someone can prove me wrong, I’d be interested in knowing how you’ve set it up.