The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.
Jonah is the Engineering Manager of the Database Infrastructure team at GitHub. His previous job was as a Senior DBA at Twitter and he had humble beginnings working as a remote DBA for a variety of customers at Blue Gecko. He enjoys looking at graphs and writing scripts to do his job for him.
Tom has been working with MySQL since 2003. He started working with MySQL as a PHP developer. He briefly moved over to systems administration where he was responsible for Apache and MySQL servers. His desire to learn more about databases moved him into a role as a DBA and he's happily filled that role at several companies. He is currently working at GitHub helping automate and expand their existing architecture. He's previously worked for Box, Twitter, & Booking.com.