Shlomi is an engineer and a database geek. He is an active MySQL community member, authors gh-ost, orchestrator, common_schema and other open source tools, and blogs at http://openark.org. Shlomi works at GitHub on the database infrastructure team keeping the data flowing. Previously he managed high availability of X,000 of MySQL servers at Booking.com, and prior to that solved infrastructure problems at Outbrain. He is recipient of Oracle ACE, Oracle Technologist of the Year, and MySQL Community Member of the Year awards.
Orchestrator is a MySQL topology manager and a failover solution, used in production on many large MySQL installments. It allows for detecting, querying and refactoring complex replication topologies, and provides reliable failure detection and intelligent recovery & promotion.
This tutorial walks through Orchestrator's setup, deployment and usage best practices. We will focus on major functionality points and share authoritative advice on practical production use.
Our cheat sheet covers:
- Detection: resolving, classification, pools, inspection.
- Topologies: Pseudo GTID, refactoring, querying for info.
- Failovers: configuration, promotion preferences, hooks, downtime, acknowledgements, planned failovers.
- Scripting: putting-it-all-together use case for automating failover tests.
- HA: Making orchestrator highly available, including recent consensus development.
This tutorial will be hands off, and open to discussion and examples by/for the attendees.
The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.