Crossing the Production Barrier: Development at Scale
There comes a point in your company’s lifetime when you’ve outgrown the ability to refresh your development database from production because you have accumulated so much data over time. At some point replicating that dataset becomes much too cumbersome and inconvenient to reproduce in your development environment. Either the dataset has become too large for the development setup or the amount of time it takes to sync is no longer worth the wait. This talk discusses some of the approaches to getting around this problem.
There are at least two ways most companies work around this. First is taking a subset of production data, which ends up being a pretty difficult problem given most data is referential and may point to a never ending trail of intertwined graphs. By the time you have a valid data set to work with, you’ve pulled almost all of the production data anyway.
The other way is to harness production directly. We’ll discuss how Etsy uses MySQL proxy as a gateway to the production database. The tool is used for both security and auditing purposes, and injects additional data into each query for searchability and diagnosis at a later point in time. We have also built tooling and procedures for restoration in the event of an accidental mis-manipulation of data from development. We use Percona based tools like pt-slave-delay to establish a windowed replication that can be used as a restoration point and pt-table-sync to patch inconsistencies in shard sides. Data also can be restored at an object level by using MySQL binlogs and identifying primary keys that the corrupted data corresponds to.
Development data at scale is a non-trivial problem that can creep up on you as your company becomes successful, but there are ways to get around putting your development process on hold. There are some problems that have to be developed with relevant data sets and hard to reproduce edge cases that can only be analyzed with a specific set of data, and we believe that giving some level of access to production systems in a safe way can be a great solution to the problem.