Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server

Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server

PREVIOUS POST
NEXT POST

Point-In-Time RecoveryIn this blog post, I’ll look at point-in-time recovery (PITR) options for MySQL, MariaDB and Percona Server for MySQL.

It is a common good practice to extend data safety by having additional measures apart from regular data backups, such as delayed slaves and binary log backups. These two options provide the ability to restore the data to any given point in time, or just revert from some bad accidents. These methods have their limitations of course: delayed slaves only help if a deadly mistake is noticed fast enough, while full point-in-time recovery (PITR) requires the last full backup and binary logs (and therefore usually takes a lot of time).

How to reverse from disaster faster

Alibaba engineers and the MariaDB team implemented an interesting feature in their version of the mysqlbinlog tool: the --flashback option. Based on ROW-based DML events, it can transform the binary log and reverse purposes. That means it can help undo given row changes extremely fast. For instance, it can change DELETE events to INSERTs and vice versa, and it will swap WHERE and SET parts of the UPDATE events. This simple idea can dramatically speed up recovery from certain types of mistakes or disasters.

The question is whether it works with non-MariaDB variants. To verify that, I tested this feature with the latest available Percona Server for MySQL 5.7 (which is fully compatible with upstream MySQL).

First, let’s simulate one possible deadly scenario: a forgotten WHERE in DELETE statement:

So, our data is lost on both the master and slave!

Let’s start by downloading the latest MariaDB server 10.2.x package, which I’m hoping has a mysqlbinlog tool that works with MySQL 5.7, and unpack it to some custom location:

It has the function we are looking for. Now, we have to find the culprit transaction or set of transactions we want to revert. A simplified example may look like this:

By searching through the decoded binary log, we are looking for transactions that have wiped out the table test.sbtest1. It looks like this (as the table had 200 rows, it is pretty long, so I’ve pasting only the beginning and the end):

It is very important to take the proper start and stop positions. We need the ones exactly after BEGIN and before the final COMMIT. Then, let’s test if the tool produces the reverse statements as expected. First, decode the rows to the .sql file:

Inside, we find 200 of those. Looks good:

Since we verified the positions are correct, we can prepare a binary log file:

and load it back to our master:

and double check they restored on slaves:

GTID problem

MariaDB has a completely different GTID implementation from MySQL and Percona Server. You can expect problems when decoding incompatible GTID enabled binary logs with MariaDB. As MariaDB’s mysqlbinlog does not support –start/stop-gtid options (even for its own implementation), we have to take the usual positions anyway. From a GTID-enabled binary log, for example, delete can look like this:

The tool seems to work, and transforms the delete transaction to a sequence of INSERTs. However, the server rejects it when we try to load it on a GTID-enabled master:

Unfortunately, the solution here is either to disable GTID mode for the recovery time (which is surely tricky in replicated clusters), or try to add GTID-related information to the resulting binary log with the --flashback option. In my case, adding these lines worked (I used the next free available GTID sequence):

Limitations

Obviously, flashback cannot help after DROP/TRUNCATE or other DDL commands. These are not transactional, and affected rows are never recorded in the binary log. It doesn’t work with encrypted or compressed binary logs either. But most importantly, to produce complete events that can reverse bad transactions, the binary format must be ROW. The row image also must be FULL:

If these conditions are not met (or if you’re dealing with a too-complicated GTID issue), you will have to follow the standard point-in-time recovery procedure.

PREVIOUS POST
NEXT POST

Share this post

Comments (2)

  • Scott Noyes Reply

    It also will not recover changes made by a cascading foreign key.

    April 12, 2018 at 3:15 pm
  • Shlomi Noach Reply

    Thank you for this post! Here is an awk one liner to re-inject @@SESSION.GTID_NEXT into the binary log:


    | awk 'BEGIN {"uuidgen -t" |& getline u} /^BEGIN/ {c += 1 ; print "SET @@SESSION.GTID_NEXT= \x27" u ":" c "\x27/*!*/;"} {print}'

    This will create a one time, time-based UUID, and inject a SET @@SESSION.GTID_NEXT entry just before a BEGIN statement in the binary log. Range is sequentially running :1:n

    Also, this:


    | sed -e s/', @@session.check_constraint_checks=1//g'

    and so, for example, you would:

    cat mysql-bin.000002_flash.bin | awk 'BEGIN {"uuidgen -t" |& getline u} /^BEGIN/ {c += 1 ; print "SET @@SESSION.GTID_NEXT= \x27" u ":" c "\x27/*!*/;"} {print}' | sed -e s/', @@session.check_constraint_checks=1//g' | mysql -h localhost

    October 29, 2018 at 7:07 am

Leave a Reply