Upgrade your database: without losing your data, your perf or your mind
Upgrading databases can be terrifying and perilous, and for good reason: you can totally screw yourself! Every workload is unique, and standardized test suites can rarely give you enough information to evaluate how any given upgrade will actually perform for your query set. We'll start by defining with the minimal diligence you should do before upgrading any data store, no matter how trivial. We'll talk through some guidelines you can use to figure out how paranoid you should be about any given workload or change set, as well as how to balance risk vs. a rabbit hole of infinite engineering effort based on your organizational ability to absorb risk. Once you're facing a tricky upgrade, we'll cover some strategies for validating the upgrade and gaining confidence that it won't be a disaster -- traffic splitters, query profiling, and especially capturing and replaying real production traffic against real production snapshots. We'll talk about some strategies for teasing out complex bugs and stalls, like why you should always replay at max throughput as well as according to original timestamps, and why MAX is always more important than the 99.99th percentile. The principles apply to any database, but we will particularly share war stories and useful tools that mostly apply to MongoDB and MySQL.
CEO/cofounder of honeycomb.io. Honeycomb combines the raw accuracy of log aggregators, the speed of time series metrics, and the flexibility of APM (application performance metrics) to provide the world's first truly next-generation analytics service. Previously ran operations at Parse/Facebook, managing a massive fleet of MongoDB replica sets as well as Redis, Cassandra, and MySQL. Worked closely with the RocksDB team at Facebook to develop and roll out the world's first Mongo+Rocks deployment using the pluggable storage engine API.