Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

How to Restore MySQL Logical Backups

February 23, 2018

Author

Share this Post:

The ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.

Even if you lost all data from a production server, physical backups (data files snapshot created with an offline copy or with Percona XtraBackup) could show the same internal database structure corruption as in production data. Backups in a simple plain text format allow you to avoid such corruptions and migrate between database formats (e.g., during a software upgrade and downgrade), or even help with migration from completely different database solution.

Unfortunately, the restore speed for logical backups is usually bad, and for a big database, it could require days or even weeks to get data back. Thus it’s important to tune backups and MySQL for the fastest data restore and change settings back before production operations.

Disclaimer

All results are specific to my combination of hardware and dataset but could be used as an illustration for MySQL database tuning procedures related to logical backup restore.

Benchmark

There is no general advice for tuning a MySQL database for a bulk logical backup load, and any parameter should be verified with a test on your hardware and database. In this article, we will explore some variables that help that process. To illustrate the tuning procedure, I’ve downloaded IMDB CSV files and created a MySQL database with pyimdb.

You may repeat the whole benchmark procedure or just look at settings changed and resulting times.

Database:

- 16GB – InnoDB database size

- 6.6GB – uncompressed mysqldump sql

- 5.8GB – uncompressed CSV + create table statements.

The simplest restore procedure for logical backups created by the mysqldump tool:

mysql -e 'create database imdb;'
time mysql imdb < imdb.sql
# real 129m51.389s

mysql -e 'create database imdb;'

time mysql imdb < imdb.sql

# real 129m51.389s

This requires slightly more than two hours to restore the backup into the MySQL instance started with default settings.

I’m using the Docker image percona:latest – it contains Percona Server 5.7.20-19 running on a laptop with 16GB RAM, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, two disks: SSD KINGSTON RBU-SNS and HDD HGST HTS721010A9.

Let’s start with some “good” settings: buffer pool bigger than default, 2x1GB transaction log files, disable sync (because we are using slow HDD), and set big values for IO capacity,
the load should be faster with big batches thus use 1GB for max_allowed_packet.

Values were chosen to be bigger than the default MySQL parameters because I’m trying to see the difference between the usually suggested values (like 80% of RAM should belong to InnoDB buffer pool).

docker run --publish-all --name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7 
  --innodb_buffer_pool_size=4GB 
  --innodb_log_file_size=1G 
  --skip-log-bin 
  --innodb_flush_log_at_trx_commit=0 
  --innodb_flush_method=nosync 
  --innodb_io_capacity=2000 
  --innodb_io_capacity_max=3000 
  --max_allowed_packet=1G
  time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
  # real 59m34.252s

docker run --publish-all --name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7

--innodb_buffer_pool_size=4GB

--innodb_log_file_size=1G

--skip-log-bin

--innodb_flush_log_at_trx_commit=0

--innodb_flush_method=nosync

--innodb_io_capacity=2000

--innodb_io_capacity_max=3000

--max_allowed_packet=1G

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )

# real 59m34.252s

The load is IO bounded, and there is no reaction on set global foreign_key_checks=0 and unique_checks=0 because these variables are already disabled in the dump file.

How can we reduce IO?

Disable InnoDB double write: --innodb_doublewrite=0

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 44m49.963s

1 2	time (mysql --max_allowed_packet=1G imdb1 < imdb.sql ) # real 44m49.963s

A huge improvement, but we still have an IO-bounded load.

We will not be able to improve load time significantly for IO bounded load. Let’s move to SSD:

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 33m36.975s

1 2	time (mysql --max_allowed_packet=1G imdb1 < imdb.sql ) # real 33m36.975s

Is it vital to disable disk sync for the InnoDB transaction log?

sudo rm -rf mysql/*
docker rm p57
docker run -v /home/ihanick/Private/Src/tmp/data-movies/imdb.sql:/root/imdb.sql -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql 
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7 
--innodb_buffer_pool_size=4GB 
--innodb_log_file_size=1G 
--skip-log-bin 
--innodb_flush_log_at_trx_commit=0 
--innodb_io_capacity=700 
--innodb_io_capacity_max=1500 
--max_allowed_packet=1G 
--innodb_doublewrite=0
# real 33m49.724s

sudo rm -rf mysql/*

docker rm p57

docker run -v /home/ihanick/Private/Src/tmp/data-movies/imdb.sql:/root/imdb.sql -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql

--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7

--innodb_buffer_pool_size=4GB

--innodb_log_file_size=1G

--skip-log-bin

--innodb_flush_log_at_trx_commit=0

--innodb_io_capacity=700

--innodb_io_capacity_max=1500

--max_allowed_packet=1G

--innodb_doublewrite=0

# real 33m49.724s

There is no significant difference.

By default, mysqldump produces SQL data, but it could also save data to CSV format:

cd /var/lib/mysql-files
mkdir imdb
chown mysql:mysql imdb/
time mysqldump --max_allowed_packet=128M --tab /var/lib/mysql-files/imdb imdb1
# real 1m45.983s
sudo rm -rf mysql/*
docker rm p57
docker run -v /srv/ihanick/tmp/imdb:/var/lib/mysql-files/imdb -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql 
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7 
--innodb_buffer_pool_size=4GB 
--innodb_log_file_size=1G 
--skip-log-bin 
--innodb_flush_log_at_trx_commit=0 
--innodb_io_capacity=700 
--innodb_io_capacity_max=1500 
--max_allowed_packet=1G 
--innodb_doublewrite=0
time (
mysql -e 'drop database imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1 ;
for i in $PWD/*.txt ; do mysqlimport imdb1 $i ; done
)
# real 21m56.049s
1.5X faster, just because of changing the format from SQL to CSV!

cd /var/lib/mysql-files

mkdir imdb

chown mysql:mysql imdb/

time mysqldump --max_allowed_packet=128M --tab /var/lib/mysql-files/imdb imdb1

# real 1m45.983s

sudo rm -rf mysql/*

docker rm p57

docker run -v /srv/ihanick/tmp/imdb:/var/lib/mysql-files/imdb -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql

--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7

--innodb_buffer_pool_size=4GB

--innodb_log_file_size=1G

--skip-log-bin

--innodb_flush_log_at_trx_commit=0

--innodb_io_capacity=700

--innodb_io_capacity_max=1500

--max_allowed_packet=1G

--innodb_doublewrite=0

time (

mysql -e 'drop database imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'

(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1 ;

for i in $PWD/*.txt ; do mysqlimport imdb1 $i ; done

)

# real 21m56.049s

1.5X faster, just because of changing the format from SQL to CSV!

We’re still using only one CPU core, let’s improve the load with the –use-threads=4 option:

time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
mysqlimport --use-threads=4 imdb1 $PWD/*.txt
)
# real 15m38.147s

time (

mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'

(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1

mysqlimport --use-threads=4 imdb1 $PWD/*.txt

)

# real 15m38.147s

In the end, the load is still not fully parallel due to a big table: all other tables are loaded, but one thread is still active.

Let’s split CSV files into smaller ones. For example, 100k rows in each file and load with GNU/parallel:

# /var/lib/mysql-files/imdb/test-restore.sh
apt-get update ; apt-get install -y parallel
cd /var/lib/mysql-files/imdb
time (
cd split1
for i in ../*.txt ; do echo $i ; split -a 6 -l 100000 -- $i `basename $i .txt`. ; done
for i in `ls *.*|sed 's/^[^.]+.//'|sort -u` ; do
mkdir ../split-$i
for j in *.$i ; do mv $j ../split-$i/${j/$i/txt} ; done
done
)
# real 2m26.566s
time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
parallel 'mysqlimport imdb1 /var/lib/mysql-files/imdb/{}/*.txt' ::: split-*
)
#real 16m50.314s

# /var/lib/mysql-files/imdb/test-restore.sh

apt-get update ; apt-get install -y parallel

cd /var/lib/mysql-files/imdb

time (

cd split1

for i in ../*.txt ; do echo $i ; split -a 6 -l 100000 -- $i `basename $i .txt`. ; done

for i in `ls *.*|sed 's/^[^.]+.//'|sort -u` ; do

mkdir ../split-$i

for j in *.$i ; do mv $j ../split-$i/${j/$i/txt} ; done

done

)

# real 2m26.566s

time (

mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'

(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1

parallel 'mysqlimport imdb1 /var/lib/mysql-files/imdb/{}/*.txt' ::: split-*

)

#real 16m50.314s

Split is not free, but you can split your dump files right after backup.

The load is parallel now, but the single big table strikes back with ‘setting auto-inc lock’ in SHOW ENGINE INNODB STATUSG

Using the --innodb_autoinc_lock_mode=2 option fixes this issue: 16m2.567s.

We got slightly better results with just mysqlimport --use-threads=4. Let’s check if hyperthreading helps and if the problem caused by “parallel” tool:

- Using four parallel jobs for load: 17m3.662s

- Using four parallel jobs for load and two threads: 16m4.218s

There is no difference between GNU/Parallel and --use-threads option of mysqlimport.

Why 100k rows? With 500k rows: 15m33.258s

Now we have performance better than for mysqlimport --use-threads=4.

How about 1M rows at once? Just 16m52.357s.

I see periodic flushing logs message with bigger transaction logs (2x4GB): 12m18.160s:

--innodb_buffer_pool_size=4GB --innodb_log_file_size=4G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2 --performance-schema=0

1	--innodb_buffer_pool_size=4GB --innodb_log_file_size=4G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2 --performance-schema=0

Let’s compare the number with myloader 0.6.1 also running with four threads (myloader have only -d parameter, myloader execution time is under corresponding mydumper command):

# oversized statement size to get 0.5M rows in one statement, single statement per chunk file
mydumper -B imdb1 --no-locks --rows 500000 --statement-size 536870912 -o 500kRows512MBstatement
17m59.866s
mydumper -B imdb1 --no-locks -o default_options
17m15.175s
mydumper -B imdb1 --no-locks --chunk-filesize 128 -o chunk128MB
16m36.878s
mydumper -B imdb1 --no-locks --chunk-filesize 64 -o chunk64MB
18m15.266s

# oversized statement size to get 0.5M rows in one statement, single statement per chunk file

mydumper -B imdb1 --no-locks --rows 500000 --statement-size 536870912 -o 500kRows512MBstatement

17m59.866s

mydumper -B imdb1 --no-locks -o default_options

17m15.175s

mydumper -B imdb1 --no-locks --chunk-filesize 128 -o chunk128MB

16m36.878s

mydumper -B imdb1 --no-locks --chunk-filesize 64 -o chunk64MB

18m15.266s

It will be great to test mydumper with CSV format, but unfortunately, it wasn’t implemented in the last 1.5 years: https://bugs.launchpad.net/mydumper/+bug/1640550.

Returning back to parallel CSV files load, even bigger transaction logs 2x8GB: 11m15.132s.

What about a bigger buffer pool: --innodb_buffer_pool_size=12G? 9m41.519s

Let’s check six-year-old server-grade hardware: Intel(R) Xeon(R) CPU E5-2430 with SAS raid (used only for single SQL file restore test) and NVMe (Intel Corporation PCIe Data Center SSD, used for all other tests).

I’m using similar options as for previous tests, with 100k rows split for CSV files load:

--innodb_buffer_pool_size=8GB --innodb_log_file_size=8G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2

1	--innodb_buffer_pool_size=8GB --innodb_log_file_size=8G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2

- Single SQL file created by mysqldump loaded for 117m29.062s = 2x slower.

- 24 parallel processes of mysqlimport: 11m51.718s

- Again hyperthreading making a huge difference! 12 parallel jobs: 18m3.699s.

- Due to higher concurrency, adaptive hash index is a reason for locking contention. After disabling it with --skip-innodb_adaptive_hash_index: 10m52.788s.

- In many places, disable unique checks referred as a performance booster: 10m52.489s
  You can spend more time reading advice about unique_checks, but it might help for some databases with many unique indexes (in addition to primary one).

- The buffer pool is smaller than the dataset, can you change old/new pages split to make insert faster? No: --innodb_old_blocks_pct=5 : 10m59.517s.

- O_DIRECT is also recommended: --innodb_flush_method=O_DIRECT: 11m1.742s.

- O_DIRECT is not able to improve performance by itself, but if you can use a bigger buffer pool: O_DIRECT + 30% bigger buffer pool: --innodb_buffeer_pool_size=11G: 10m46.716s.

Conclusions on how to restore MySQL logical backups

- There is no common solution to improve logical backup restore procedure.

- If you have IO-bounded restore: disable InnoDB double write. It’s safe because even if the database crashes during restore, you can restart the operation.

- Do not use SQL dumps for databases > 5-10GB. CSV files are much faster for mysqldump+mysql. Implement mysqldump --tabs+mysqlimport or use mydumper/myloader with appropriate chunk-filesize.

- The number of rows per load data infile batch is important. Usually 100K-1M, use binary search (2-3 iterations) to find a good value for your dataset.

- InnoDB log file size and buffer pool size are really important options for backup restore performance.

- O_DIRECT reduces insert speed, but it’s good if you can increase the buffer pool size.

- If you have enough RAM or SSD, the restore procedure is limited by CPU. Use a faster CPU (higher frequency, turboboost).

- Hyperthreading also counts.

- A powerful server could be slower than your laptop (12×2.4GHz vs. 4×2.8+turboboost).

- Even with modern hardware, it’s hard to expect backup restore faster than 50MBps (for the final size of InnoDB database).

- You can find a lot of different advice on how to improve backup load speed. Unfortunately, it’s not possible to implement improvements blindly, and you should know the limits of your system with general Unix performance tools like vmstat, iostat and various MySQL commands like SHOW ENGINE INNODB STATUS (all can be collected together with pt-stalk).

- Percona Monitoring and Management (PMM) also provides good graphs, but you should be careful with QAN: full slow query log during logical database dump restore can cause significant processing load.

- Default MySQL settings could cost you 10x backup restore slowdown

- This benchmark is aimed at speeding up the restore procedure while the application is not running and the server is not used in production. Make sure that you have reverted all configuration parameters back to production values after load. For example, if you disable the InnoDB double write buffer during restore and left it enabled in production, you may have scary data corruption due to partial InnoDB pages writes.

- If the application is running during restore, in most cases you will get an inconsistent database due to missing support for locking or correct transactions for restore methods (discussed above).