September 2, 2014

A conversation with 5 Facebook MySQL gurus

A conversation with 6 Facebook MySQL gurusFacebook, the undisputed king of online social networks, has 1.23 billion monthly active users collectively contributing to an ocean of data-intensive tasks – making the company one of the world’s top MySQL users.

A small army of Facebook MySQL experts will be converging on Santa Clara, Calif. next week where several of them are leading sessions at the Percona Live MySQL Conference and Expo. I had the chance to chat virtually with four of them about their sessions: Steaphan Greene, Evan Elias, Shlomo Priymak and Yoshinori MatsunobuMark Callaghan, who spoke at Percona Live last year, also joined our conversation which included a discussion of Facebook’s use of MySQL and other open source technologies.


Tom: What’s Facebook’s view of Open Source?

Steaphan: Facebook was built on open source software, and we still invest heavily in open source today. We understand the power these communities have to drive innovation – they allow us to focus on new challenges, as opposed to reinventing the wheel over and over again. And contributing as much as possible back to open projects is in everyone’s best interest.


Tom: Why MySQL? Wouldn’t NoSQL databases, for example, be better suited for the massive workloads seen at Facebook?

Mark: MySQL is great for many of our important workloads. We make it even better with our expertise in MySQL operations and engineering, and by working with the community and learning from their experience.

Yoshinori: I have not been able to find a transactional NoSQL database better than InnoDB. And it’s easy to understand how MySQL Replication works, which makes much easier to fix problems in production.


 Tom: How does Facebook make MySQL scale?

Steaphan: Sharding, automation, monitoring, and heavy investment in operations and performance engineering.


 Tom: What other things help Facebook run smoothly?

Steaphan: Our completely open culture, and the freedom all engineers here have to try any idea they have.


Tom: What is the top scaling challenge(s) Facebook faces in 2014 – and beyond?

Mark: Our biggest challenge is to make things better (performance, efficiency, availability) in the future at the rate we made things better in the past.

Yoshinori: Availability has improved a lot so far for us. Come to my session at Percona Live to hear about that. For me personally, efficiency is the biggest challenge for 2014 and 2015. This includes reducing space and optimizing for newer-generation hardware.


Tom: Facebook deployed MySQL 5.6 last year – including on critical environments – long before many other large organizations. What prompted such a move so soon? And where there any major concerns?

Steaphan: The same thing that prompts most efforts on the Facebook infra team: We will consider any technology that will help us improve performance, efficiency, or reliability, and we’re willing to accept the risk that sometimes comes along with adopting things like 5.6 very early on. But that’s only half the story here. The other half is that Facebook encourages its engineers to go after big bets like these — in this case it was just one engineer who made this happen. And we had the MySQL engineering talent we needed to work with the Oracle team to get 5.6 ready for production at our scale.

Yoshinori: At Facebook, we have three MySQL teams — Operations, Performance and Engineering. Facebook is one of the very few MySQL users that has internal MySQL developers. We all worked hard to adapt 5.6 to our scale and ensure that it would be production-ready. We found some issues after production deployment, but in many cases we could fix the problem and deployed new MySQL binary within one or two days. When deploying in production, we expected that we encountered MySQL 5.6 specific issues, which was typical when releasing new software. We were just confident that we could fix issues immediately.

Our 5.6 deployment step was not all at once. At first rollout, we disabled most major 5.6 features, such as GTID and binlog checksum. We gradually enabled such features in production.


Tom: Where there any significant issues in that move to MySQL 5.6? Any lessons learned you like to share – along with best practices you’d like to share?

Yoshinori: Performance regression of the CPU intensive replication was a main blocker for some of our applications. I wrote a blog post about this last year. We have several design plans to fix the problem on MySQL side. One of the most effective plans is grouping multiple transactions into one, since the most expensive part is writing to InnoDB system table at transaction commit. This optimization would be done when writing binary log, or by SQL thread. It may take longer time to test and deploy in production. For existing applications, we optimized to group multiple transactions from application side to mitigate the problem.


Tom: Performance monitoring is usually challenging at any organization. How do you do that at Facebook, which has tens of thousands of MySQL instances?

Yoshinori: Top-N monitoring is very important for managing a huge number of instances. Average statistics (for example: average innodb_rows_read across all instances) is not always useful since ~1% of problematic instances won’t noticeably change average numbers. p99 gives better indicators, but in our environment we typically have fewer than 0.1% instances causing problems, in which case p99 is not helpful either. We have several graphical and command-line tools to efficiently list up top-N bad behaving hosts. After listing up bad instances, the way to investigate root cause is pretty straightforward, like what MySQL consultants usually do. Server failure is something we expect and plan for at Facebook. For example, typical MySQL DBA at small companies may not encounter master instance failure during employment, because recent mysqld and H/W are stable enough. At Facebook, master failure is a norm and something the system can accommodate.


Tom: Evan, you and Yoshinori will present on Global Transaction ID (GTID) at Facebook. GTID is very tricky to deploy to an existing large-scale environment – how, and why, did you decide on adopting it?

Evan: Our primary motivations for adopting GTID all relate to either failover or binlog backups. When a master fails, getting replicas in sync with GTID is substantially simpler, faster, and less error-prone than previous methods of diffing binlogs. For backups, GTID is a cornerstone in building cross-datacenter point-in-time recovery, without needing redundant binlog streams from every region.

The “how” question is a bit more involved, and we’ll be covering this in detail during the session. The GTID project was a joint effort between three of Facebook’s MySQL teams. Santosh added new functionality to the MySQL server to make online rollout possible, and Yoshinori improved MHA to seamlessly support GTID-based failover. I added GTID support to all of our other in-house automation, and also scripted the rollout procedure across our many thousands of replica sets. A lot of validation logic and monitoring functionality was involved to ensure the safety of the rollout.


Tom: Shlomo, your session is titled “Under the Hood – MySQL Pool Scanner (MPS).” As you point out in your talk, Facebook has one of the largest MySQL database clusters in the world, comprising thousands of servers across multiple data centers. You must have an army of DBAs – or is there some secret you’d like to share? 

Shlomo: We do have an army, yes — it’s an “army of one.” We have one person on call on the MySQL Operations team at a given time, and they don’t even need to do all that much most days. We built “robots” to do our day to day jobs. The largest and most complex robot we have is MPS, an automated system to do most of the work a DBA might otherwise spend time on, such as replacement of broken or overflowing servers. Among other things, MPS also allows a human to initiate complex bulk operations with a few keystrokes, and it will follow up and complete the operations over the course of days or weeks.

I’ll be describing some of the complex MySQL automation systems we have at Facebook, and how they fit together during my talk.


Tom: Shlomo, what does a typical day look like for you there at Facebook?

Shlomo: The team’s work mostly focuses on maintaining those robots I’ve mentioned, as well as developing new ways to improve the reliability of our databases for Facebook’s users. This year the team also spent a lot of time making sure the new MySQL features such as GTIDs and semi-sync are deeply integrated in our automation. Every day, we work hard to to make ourselves obsolete, but we haven’t gotten there just yet!

On a typical day, I probably spend much of the time coding, mainly in Python. I also spend a significant amount of time working on capacity-related projects, such as thinking of ways to optimize the way we distribute the data across our fleet of servers.
Even after 2.5 years at Facebook, I am still in awe of the number of servers we manage. The typical small-scale maintenance operation at Facebook probably involves more servers than all the companies I’ve previously worked for had, combined. It really is pretty amazing!


Tom:  What are you looking forward to the most at this year’s conference?

Evan: There are plenty of fascinating sessions this year. Just to mention a handful: Jeremy Cole and Davi Arnaut’s session on innodb_ruby, since it’s a very unique way to interactively learn about InnoDB’s internals. Baron Schwartz’s session on using Go with MySQL, as VividCortex is blazing the trail here. Peter Boros and Kenny Gryp’s talk on scalability and benchmarking, which I’m hoping will include recent developments of Percona Playback. Tom Christ’s session on my former project Jetpants, to see how it has evolved over the past year at Tumblr. And several talks by Oracle engineers about upcoming functionality in MySQL 5.7.

Steaphan: In addition to the conference sessions, I look forward to the birds of a feather session with the MySQL team.  Last year, it proved to be a valuable opportunity to engage with those upstream developers who make the changes we care about, and I expect the same this year.


Tom: If you could talk to a DBA or developer on the fence about attending this year’s conference, what would be your top 3-5 reasons for making it over to Santa Clara for this event?

Evan: I’m based in NYC, so I’m traveling a bit further than many of my colleagues, but I can still confidently say that Percona Live is well worth the trip. The MySQL ecosystem is very healthy and constantly evolving, and the conference is the best place to learn about ongoing developments across a wide spectrum of companies and contributors. It’s also a perfect opportunity to personally connect with all of the amazing engineers, DBAs, users, and vendors that make MySQL so unique and compelling.


 The Percona Live MySQL Conference and Expo 2014 runs April 1-4 in Santa Clara, Calif. Use the code “SeeMeSpeak” when registering and save 10 percent. The inaugural Open Source Appreciation Day is on March 31 – this full-day event is free but because space is limited I suggest registering now to reserve your spot.

About Tom Diederich

Tom manages community/social media at Percona, the largest independent MySQL services company. He's been building and managing online communities since 1999 for Silicon Valley companies that include TurboLinux, Palm, Intuit, Symantec & Cadence Design Systems. Tom has a B.A. in Journalism from The Ohio State University.

Speak Your Mind

*