Emily is a software engineer on the Online Data Stores team at Square. She spends her time writing tools to help manage the MySQL and Redis fleet.
How fast can you failover your databases? Do you trust it? Do you trust the process enough to let [almost] anyone do it, at any time? We do!
At Square, we manage thousands of MySQL and Redis database clusters. We recently rewrote all of our automation which fails over MySQL databases - making it even faster and more reliable. We brought the time from the user requesting the action, to database writes going to the new target - to generally under 2 seconds, with no real downtime or risk. This rewrite went so well for MySQL, that we decided to further abstract the process and apply the exact same set of tools to our Redis.
This talk describes the prerequisites, process, tooling, and lessons learned in safely cutting over database traffic and abstracting the process to apply to both MySQL and Redis.
At Square, we operate thousands of database instances to power a financial network, from payments to payroll. In a word: money. "Mission-critical" isn't critical enough. Come learn how we operate MySQL and Redis with billions of dollars at stake. We'll look at everything: configuration, management, monitoring, tooling, security, high-availability, replication, etc.
Join us for the MySQL Community Awards presented by Emily Slocombe!