Buy Percona ServicesBuy Now!

Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server

 | April 12, 2018 |  Posted In: Backups, Database Recovery, Insight for DBAs, MariaDB, MySQL

PREVIOUS POST
NEXT POST

Point-In-Time RecoveryIn this blog post, I’ll look at point-in-time recovery (PITR) options for MySQL, MariaDB and Percona Server for MySQL.

It is a common good practice to extend data safety by having additional measures apart from regular data backups, such as delayed slaves and binary log backups. These two options provide the ability to restore the data to any given point in time, or just revert from some bad accidents. These methods have their limitations of course: delayed slaves only help if a deadly mistake is noticed fast enough, while full point-in-time recovery (PITR) requires the last full backup and binary logs (and therefore usually takes a lot of time).

How to reverse from disaster faster

Alibaba engineers and the MariaDB team implemented an interesting feature in their version of the mysqlbinlog tool: the --flashback option. Based on ROW-based DML events, it can transform the binary log and reverse purposes. That means it can help undo given row changes extremely fast. For instance, it can change DELETE events to INSERTs and vice versa, and it will swap WHERE and SET parts of the UPDATE events. This simple idea can dramatically speed up recovery from certain types of mistakes or disasters.

The question is whether it works with non-MariaDB variants. To verify that, I tested this feature with the latest available Percona Server for MySQL 5.7 (which is fully compatible with upstream MySQL).

First, let’s simulate one possible deadly scenario: a forgotten WHERE in DELETE statement:

So, our data is lost on both the master and slave!

Let’s start by downloading the latest MariaDB server 10.2.x package, which I’m hoping has a mysqlbinlog tool that works with MySQL 5.7, and unpack it to some custom location:

It has the function we are looking for. Now, we have to find the culprit transaction or set of transactions we want to revert. A simplified example may look like this:

By searching through the decoded binary log, we are looking for transactions that have wiped out the table test.sbtest1. It looks like this (as the table had 200 rows, it is pretty long, so I’ve pasting only the beginning and the end):

It is very important to take the proper start and stop positions. We need the ones exactly after BEGIN and before the final COMMIT. Then, let’s test if the tool produces the reverse statements as expected. First, decode the rows to the .sql file:

Inside, we find 200 of those. Looks good:

Since we verified the positions are correct, we can prepare a binary log file:

and load it back to our master:

and double check they restored on slaves:

GTID problem

MariaDB has a completely different GTID implementation from MySQL and Percona Server. You can expect problems when decoding incompatible GTID enabled binary logs with MariaDB. As MariaDB’s mysqlbinlog does not support –start/stop-gtid options (even for its own implementation), we have to take the usual positions anyway. From a GTID-enabled binary log, for example, delete can look like this:

The tool seems to work, and transforms the delete transaction to a sequence of INSERTs. However, the server rejects it when we try to load it on a GTID-enabled master:

Unfortunately, the solution here is either to disable GTID mode for the recovery time (which is surely tricky in replicated clusters), or try to add GTID-related information to the resulting binary log with the --flashback option. In my case, adding these lines worked (I used the next free available GTID sequence):

Limitations

Obviously, flashback cannot help after DROP/TRUNCATE or other DDL commands. These are not transactional, and affected rows are never recorded in the binary log. It doesn’t work with encrypted or compressed binary logs either. But most importantly, to produce complete events that can reverse bad transactions, the binary format must be ROW. The row image also must be FULL:

If these conditions are not met (or if you’re dealing with a too-complicated GTID issue), you will have to follow the standard point-in-time recovery procedure.

PREVIOUS POST
NEXT POST
Przemysław Malkowski

Przemek joined Support Team at Percona in August 2012. Before that he spent over five years working for Wikia.com (Quantcast Top 50) as System Administrator where he was a key person responsible for seamless building up MySQL powered database infrastructure. Besides MySQL he worked on maintaining all other parts of LAMP stack, with main focus on automation, monitoring and backups.

One Comment

Leave a Reply