How do you know if your backup is truly reliable? The last thing you want is to discover your data is corrupted during a critical restore or during a migration. While MyDumper is a powerful tool for logical backups, its -M option takes backup integrity to the next level by creating checksums. This often-overlooked feature adds a crucial layer of confidence, allowing you to verify the integrity of your data after the restore is complete. In this post, we’ll dive into what mydumper’s -M option does, how to use it, and why it should be a standard part of your backup strategy to give you true peace of mind.
Do you test your backups?
A backup is not truly a backup until it has been successfully tested. The simple fact is that a “successful” backup job report doesn’t guarantee your data is actually recoverable. Silent corruption can occur for a variety of reasons, leaving you with a seemingly perfect backup that is, in reality, useless. Regularly testing your backups is the only way to validate data integrity and confirm that your restoration procedures work as intended.
When you take a backup with mydumper, you can obtain checksums of the data, columns, indexes, and triggers of each table and the schema definition, routines, and triggers of the databases. You can find these checksums in the metadata file, for example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
[`sakila`.`staff`] real_table_name=staff rows = 2 data_checksum = 3624460561 schema_checksum = bbe23074 indexes_checksum = 73c949ad [`sakila`.`store`] real_table_name=store rows = 2 data_checksum = 3119812626 schema_checksum = b7b99b4c indexes_checksum = b4d31e3 [`sakila`] schema_checksum = 95DC8DDE post_checksum = 42085F07 triggers_checksum = A4255BB4 |
After restoration, myloader will get this checksum and compare it by executing the same steps that were executed to obtain it and report the result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
** Message: 16:52:35.080: Schema index checksum confirmed for sakila.language ** Message: 16:52:35.081: Data checksum confirmed for sakila.language ** Message: 16:52:35.081: Structure checksum confirmed for sakila.staff ** Message: 16:52:35.082: Schema index checksum confirmed for sakila.staff ** Message: 16:52:35.082: Data checksum confirmed for sakila.staff ** Message: 16:52:35.082: Structure checksum confirmed for sakila.store ** Message: 16:52:35.083: Schema index checksum confirmed for sakila.store ** Message: 16:52:35.083: Data checksum confirmed for sakila.store ** Message: 16:52:35.083: View checksum confirmed for sakila.sales_by_film_category ** Message: 16:52:35.083: View checksum confirmed for sakila.customer_list ** Message: 16:52:35.084: View checksum confirmed for sakila.film_list ** Message: 16:52:35.084: View checksum confirmed for sakila.nicer_but_slower_film_list ** Message: 16:52:35.084: View checksum confirmed for sakila.sales_by_store ** Message: 16:52:35.085: View checksum confirmed for sakila.staff_list ** Message: 16:52:35.085: View checksum confirmed for sakila.actor_info ** Message: 16:52:35.085: Schema create checksum confirmed for sakila ** Message: 16:52:35.086: Post checksum confirmed for sakila ** Message: 16:52:35.086: Triggers checksum confirmed for sakila |
mydumper checksum options
In v0.10.7, we introduced CHECKSUM TABLE, and in newer versions of MyDumper, you will find four options related to checksum:
1 2 3 4 |
-M, --checksum-all Dump checksums for all elements --data-checksums Dump table checksums with the data --schema-checksums Dump schema table and view creation checksums --routine-checksums Dump triggers, functions and routines checksums |
With -M, you will enable the other three options.
–data-checksum instructs mydumper to execute CHECKSUM TABLE over the tables that we are going to dump. During migrations to different vendors or even different versions of the same vendors, the CHECKSUM TABLE could return different values even if the data is the same due to how it is calculated.
–schema-checksums enables the checksum for the database creation options, the table column definition, and the table indexes. However, they can be ignored if they are not exported due to the usage of --no-schemas
or --skip-indexes
. This checksum is calculated by mydumper using CRC32.
-—routine-checksums enables the checksum of routines, functions, events, and triggers at the database or table level, depending on the filtering options. It also ignores them if you don’t use --routines
, --events
, or --triggers
.
myloader checksum options
On myloader, we only have:
1 |
--checksum Treat checksums: skip, fail(default), warn. |
which allows us to determine how the check is treated. We can ignore all the checksums at once, setting it to skip
, we can use fail
to stop checking, or use warn
, which will report all the differences. Checks can also be ignored if you filter them out using --no-data
, --no-schema
, --skip-indexes
, --skip-post
, or --skip-triggers
Is there any downside?
As with every task, the larger the number of objects, the longer it will take to process the checksum, as it will depend on the number of tables, indexes, triggers, databases, store procedures, functions, and events.
On the other hand, to get the table checksum, it will run CHECKSUM TABLE, which will read the whole table, so it will also depend on the size of the data that you are exporting and importing.
When is it useful?
You should use it when you need to guarantee data consistency, like during migrations or long-term backups, and you must do it if you are going to test your backups.
On the other hand, it will be pointless to use it if you are just testing performance or masquerading as a backup.