How Booking.com Avoids and Deals with MySQL/MariaDB Replication Lag
MySQL/MariaDB replication is asynchronous. You can make replication faster by using better hardware (faster CPU, more RAM, or quicker disks), or you can use parallel replication to remove it single-threaded limitation; but lag can still happen. This talk is not about making replication faster, it is how to deal with its asynchronous nature, including the (in-)famous lag. We will start by explaining the consequences of asynchronous replication and how/when lag can happen. We will also analyze how some tools, including the pt-osc, try avoiding lag. Then, we will present the solution used at Booking.com to avoid both creating lag and minimize the consequence of stale reads on slaves (hint: this solution does not mean reading from the master because this does not scale). Once all above is well understood, we will discuss how Booking.com’s solution can be improved: this solution was designed years ago and we would do this differently if starting from scratch today. Finally, I will present an innovative way to avoid lag: the no-slave-left-behind MariaDB patch.
Principal Developer, Booking.com
Since 2010, Eric has been at Booking.com addressing various data challenges including scaling usage of MySQL and bringing hadoop into production. Eric spent the prior six years as a developer at MySQL and Sun Microsystems. In his spare time, he contributes to Free Software and develops Open Hardware.