Percona Live 2018 Featured Talk – Scaling a High-Traffic Database: Moving Tables Across Clusters with Bryana Knight

February 22, 2018

Author

Dave Avery

Insight for DBAs

Insight for Developers

MySQL

Share this Post:

Welcome to the first interview blog for the upcoming Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Bryana Knight, Platform Engineer at GitHub. Her talk is titled Scaling a High-Traffic Database: Moving Tables Across Clusters. Facing an immediate need to distribute load, GitHub came up with creative ways to move a significant amount of traffic off of their main MySQL cluster – with no user impact. In our conversation, we discussed how Bryana and GitHub solved some of these issues:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Bryana: I started at GitHub as a full-stack engineer working on a new business offering, and was then shortly offered the opportunity to transition to the database services team. Our priorities back then included reviewing every single database migration for GItHub.com. Having spent my whole career as a full-stack engineer, I had to level-up pretty quickly on MySQL, data modeling, data access patterns – basically everything databases. I spent the first few months learning our schema and setup through lots of reading, mentorship from other members of my team, reviewing migrations for most of our tables, and asking a million questions.

Originally, my team spent a lot of time addressing immediate performance concerns. Then we started partnering with product engineering teams to build out the backends for new features. Now we are focused on the longterm scalability and availability of our database, stemming from how we access it. I work right between our DBA’s and our product and API engineers.

Percona: Your talk is titled “Scaling a High-Traffic Database: Moving Tables Across Clusters”. What were the challenges GitHub faced that required redistributing your tables?

Bryana: This biggest part of the GitHub codebase is an 8-year-old monolith. As a company, we’ve been fortunate enough to see a huge amount of user growth since the company started. User growth means data growth. The schema and setup that worked for GitHub early on, and very much allowed GitHub to get to where it is today with tons of features and an extremely robust API, is not necessarily the right schema and setup for the size GitHub is today.

We were seeing that higher than “normal” load was starting to have a more noticeable effect. The monolith aspect of our database, organic growth, plus inefficiencies in our code base were putting a lot of pressure on the master of our primary database cluster, which held our most core tables (think users, repos, permissions). From the database perspective, this meant contention, locking, and replica lag. From the user’s perspective, this meant anything from longer page loads to delays in UI updates and notifications, to timeouts.

Percona: What were some of the other options you looked at (if any)?

Bryana: Moving tables out of our main cluster was not the only action we took to alleviate some of the pressure in our database. However, it was the highest impact change we could make in the medium-term to give us the breathing room we needed and improve performance and availability. We also prioritized efforts around moving more reads to replicas and off the master, throttling more writes where possible, index improvements and query optimizations. Moving these tables gave us the opportunity to start thinking more long-term about how we can store and access our data differently to allow us to scale horizontally while maintaining our healthy pace of feature development.

Percona: What were the issues that needed to be worked out between the different teams you mention in your description? How did they impact the project?

Bryana: Moving tables out of our main database required collaboration between multiple teams. The team I’m on, database-services, was responsible for coming up with the strategy to move tables without user impact, writing the code to handle query isolation and routing, connection switching, backgrounding writes, and so on. Our database-infrastructure team determined where the tables we were moving should go (new cluster or existing), setup the clusters, and advised us on how to safely copy the data. In some cases, we were able to use MySQL replication. When that wasn’t possible, they weighed in on other options.

We worked with production engineers to isolate data access to these tables and safely split JOINs with other tables. Everybody needed to be sure we weren’t affecting performance and user experience when doing this. We discussed with our support team the risk of what we were doing. Then we worked with them to determine if we should preemptively status yellow when there was a higher risk of user impact. During the actual cut-overs, representatives from all these groups would get on a war-room-like video call and “push the button”, and we always made sure to have a roll-out and roll-back plan.

Percona: Why should people attend your talk? What do you hope people will take away from it?

Bryana: In terms of database performance, there are a lot of little things you can do immediately to try and make improvements: things like adding indexes, tweaking queries, and denormalizing data. There are also more drastic, architectural changes you can pursue, that many companies need to do when they get to certain scale. The topic of this talk is a valid strategy that fits between these two extremes. It relieved some ongoing performance problems and availability risk, while giving us some breathing room to think long term. I think other applications and databases might be in a similar situation and this could work for them.

Percona: What are you looking forward to at Percona Live (besides your talk)?

This is actually the first time I’m attending a Percona Live conference. I’m hoping to learn from some of the talks around scaling a high traffic database and sharding. I’m also looking forward to seeing some talks from the wonderful folks on GitHub database-infrastructure team.

Want to find out more about this Percona Live 2018 featured talk, and Bryana and GitHub’s migration? Register for Percona Live 2018, and see her talk Scaling a High-Traffic Database: Moving Tables Across Clusters. Register now to get the best price!

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.