Facebook’s Simon Martin on semi-synchronous replicationTom Diederich
Facebook, with 1.49 billion monthly active users, is one of the world’s top MySQL users. Simon Martin, a production engineer on Facebook’s MySQL Infrastructure team, has been working with MySQL for most of his career, starting from 2 servers built out of spare parts and moving through to one of the largest deployments in the world.
Simon will be sharing some of the challenges Facebook has tackled as it rolled out semi-synchronous replication across the company’s different services at Percona Live Amsterdam on Sept. 22. His talk is aptly titled, “The highs and lows of semi-synchronous replication.” I sat down, virtually, with Simon the other day. Our conversation is below, but first, as a special reward to my readers, save €20 on your Percona Live registration by entering promo code “BlogInterview” at registration. Please feel free to share this offer! 🙂
Tom: On a scale from 1-10, how important is MySQL to Facebook? And how does Facebook use MySQL?
Simon: 10. We have a sophisticated in memory caching layer that will serve most requests, but MySQL is the persistent store for our graph. This means all your profile data, all your friends, likes and comments and the same for pages, events, places and the rest are stored permanently in MySQL.
We rely on MySQL in this role for 3 key features. Firstly as the final store it needs to not lose data, and InnoDB is well proven in this space. It needs to be highly available, MySQL and InnoDB are both very stable and we use replication as well to provide redundancy. Finally, even with extensive caching, it needs to be performant, both in latency and throughput, MySQL is both and we can use replication again to spread the read traffic to slaves in remote regions to help here too.
Tom: What are some of the advantages of using Semi-Synchronous Replication at Facebook — and what are the challenges for deployments of that size when using it?
Simon: That’s a big question, I could probably talk for 50 minutes on it! We started looking at Semi-Synchronous as a solution to reduce downtime when a MySQL master, or the host it’s on, crashes. Historically, if you are running a replicated environment and the master crashes, you are faced with a choice. You could promote another slave right away to reduce downtime, but it’s impossible to be sure that any of your slaves got all the transactions off the master. At Facebook we cannot lose people’s data, so we always chose to recover the master and re-connect the slaves before promoting if required. The downside is recovering InnoDB on a busy host can be slow, and if the host is rebooted it will be even slower, giving us many minutes of downtime.
Now that we run Semi-Synchronous replication it means that a master will not commit a transaction until at least one slave has acknowledged receipt of the binary logs for that transaction. With this running when a master crashes we can be sure our most up-to-date slave has all the data, so once it’s applied by the SQL thread we can promote safely without waiting for crash recovery.
There are many challenges in this though. Firstly there is performance, we now need a network round trip for each transaction, so we need the acknowledging slaves to be very close. Slaves in a different data hall, let-alone a different region, will be too slow.
We also need to pay attention to slave availability, previously not having a slave connected to a master for a short time was not a problem, now this will cause writes to stop and connections pile up, so we need to be much more careful about how we manage our replication topology. A target of 99.999% uptime for a service now requires the same SLA on slaves being available and connected locally to acknowledge the commits.
On top of this running at “webscale” adds a layer of requirements of its own. Like the rest of our environment everything needs to be automated, anything that requires a human is not going to scale. So our automation needs to respond to any failure and heal the system without intervention in any circumstance. An edge case that has even a tiny chance of occurring on a given day needs to be handled automatically, to keep our SLA and to stop our engineers constantly having to fix things.
Tom: What are you looking forward to the most at this year’s conference (besides your own talk)?
Simon: I always enjoy the keynotes, they don’t all seem to be announced yet but it’s a great way to get a state of the community update. I’ll certainly stop by “Binlog Servers at Booking.com,” it sounds like they might be doing the same kind of things we are for Semi-Synchronous replication, so it’ll be great to compare ideas. I’ll also be looking at the talks on MySQL 5.7 to get the scoop on what cool new stuff is coming down the pipeline!