The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.
Shlomi is an engineer and a database geek. He is an active MySQL community member, authors gh-ost, orchestrator, common_schema and other open source tools, and blogs at http://openark.org. Shlomi works at GitHub on the database infrastructure team keeping the data flowing. Previously he managed high availability of X,000 of MySQL servers at Booking.com, and prior to that solved infrastructure problems at Outbrain. He is recipient of Oracle ACE, Oracle Technologist of the Year, and MySQL Community Member of the Year awards.
Tom has been working with MySQL since 2003. He started working with MySQL as a PHP developer. He briefly moved over to systems administration where he was responsible for Apache and MySQL servers. His desire to learn more about databases moved him into a role as a DBA and he's happily filled that role at several companies. He is currently working at GitHub helping automate and expand their existing architecture. He's previously worked for Box, Twitter, & Booking.com.