Logical and binary backups of thousands of database servers with multiple geographic locations, with complete coverage in 24 hours, requires a massively parallel backup architecture. Decentralized storage, multi-staged repositories, and real-time binary log collection enable rapid disaster recovery, but also provide surprising day-to-day benefits.

Tools Facebook has developed to support this infrastructure include: Real-time binlog collection for both masters and slaves, to maximize data integrity and allow point-in-time recovery; a web server that automatically streams archived binlogs allows fast slave catchup; a new wrapper for xtrabackup that replaces innobackupex and implements full and incremental binary backups, with automatic retrieval of credentials, streaming of the xtrabackup logfile to the remote target (never consuming local disk), and operation under a non-privileged user to minimize risk; and continuous automated restores ensure both mysqldump and xtrabackup archives remain intact and usable, across a wide variety of software and tool versions.

Database Administration
Kevin Knapp's picture
DBA, Facebook

I’ve been steadily scaling MySQL up for the last 5 years at and then, using a combination of bash, php, and python in a vain attempt to replace myself with a collection of scripts.

Eric Barrett's picture
Storage Engineer, Facebook

Eric Barrett has 13 years of experience the storage and database industries, having worked at NetApp, Facebook, and a few startups in between. He currently leads the operations development group that handles MySQL backups at Facebook.

