Table alters at scale
This would be an informational speak about the way and applications we use at Facebook to run table alters at scale. It includes talking about the challenges we had to face in the beginning when an alter was needed, the solutions we have developed later and the present, when we have the alters going on automatically.
With such a high number of database servers, users and attention at every single downtime we have, we had to solve problems that would never come up in a smaller environment.
The main focus in the talk is about the application I developed myself, called auto_osc, build around the opensource version of OnlineSchemaChange. This application maintains the table consistency across most of our database servers, and I would like to talk about the problems and solutions we met.
If you have thousands of servers and terrabytes of data...
- how do you get a list of inconsistent hosts quickly?
- at this number, hosts will disappear and come up randomly while you alter all the hosts, how do you deal with that?
- What about the replication chain changes?
- how do you avoid overloading your clusters?
and so on..