Nov 06, 2018
 Percona Live Open Source Database Conference Europe 2018
  Jean-François Gagné

Up to MySQL 5.5, replication was not crash safe: after a crash, it would fail with "duplicate key" or "row not found" error, or might generate silent data corruption. It looks like 5.6 is much better, right? The short answer is maybe: in the simplest case, it is possible to achieve replication crash safety but it is not the default setting. MySQL 5.7 is not much better, 8.0 has safer defaults but it is still easy to get things wrong.

Crash safety is impacted by replication positioning (File+Pos or GTID), type (single-threaded or MTS), MTS settings (Database or Logical Clock, and with or without slave preserve commit order), the sync-ing of relay logs, the presence of binary logs, log-slave-updates and their sync-ing. This is very complicated stuff and even the manual is confused about it.

In this talk, I will explain the impact of above and help you finding the path to crash safety nirvana. I will also give details about replication internals, so you might learn a thing or two.



About the Author

Jean-François Gagné

Jean-François is a System/Infrastructure Engineer and MySQL Expert. He recently joined MessageBird, an IT telco startup in Amsterdam, with the mission of scaling the MySQL infrastructure. Before that, J-F worked on growing the Booking.com MySQL and MariaDB installations including dealing with replication bottlenecks (he also works on many other non MySQL related projects that are less relevant here). Some of his latest projects are making Parallel Replication run faster and promoting Binlog Servers. He also has a good understanding of replication in general and a respectable understanding of InnoDB, MySQL, Linux and TCP/IP. Before Booking.com, he worked as a System/Network/Storage Administrator in a Linux/VMWare environment, as an Architect for a Mobile Services Provider, and as a C and Java Programmer in an IT Service Company. Even before that, when he was learning computer science, Jeff studied cache consistency in distributed systems and network group communication protocols.