Everyone does backups. Usually it’s some nightly batch job that just dumps all MySQL tables into a text file or ordinarily copies the binary files from the data directory to a safe location. Obviously both ways involve much more complex operations than it would seem by my last sentence, but it is not important right now. Either way the data is out and ready to save someone’s life (or job at least). Unfortunately taking backup does not come free of any cost. On the contrary, it’s more like doing very heavy queries against each table in the database when mysqldump is used or reading a lot of data when copying physical files, so the price may actually be rather high. And the more effectively the server resources are utilized, the more that becomes a problem.
The most obvious answer is that it needs to be read, through I/O requests, from a storage that it resides on. The storage is handling reads issued by the system, but only at an extremely limited rate. So when a task is reading a lot of data very quickly, just as the archiving process does when it runs, it causes a huge number of requests being pushed down to the disks and saturating the I/O quite easily. Naturally at the same time the database needs to perform all those regular tasks like serving queries, using and competing for the very same disks to read or write whatever comes so that your favorite website can still show up in the browser. Moreover, reads sent from backup process usually want many sequential blocks of data and such access pattern may be preferred by the I/O scheduler over random I/O coming from MySQL, but also such large I/O requests take significant time to complete and the way typical disks work prevents anything else being executed in the mean time. And so database often needs to wait much longer until disk operations are scheduled and executed which converts into slower queries execution and significantly degraded performance.
All modern systems usually use caching of whatever is being read from a storage. It allows to reduce I/O to such devices on frequently accessed information. After a successful read the block of data is placed in the cache and then served only from the memory should anything ask for the same block again. That happens for as long as it does not get flushed. MySQL obviously takes advantage of this functionality just as any other application and this is especially true for MyISAM tables which have dedicated buffers only to store indexes, while the actual data is always read from disk. The active portions of tables will likely be placed by the system in memory and kept there for a long time. Since memory access is way faster than any disk access, even with the fastest drives or RAID configurations, the performance gains are quite clear. Now going back to flushing the data out of the cache. It happens by replacing old and unused blocks with the newly read ones. And so the more new blocks come, the more older ones need to go away. Just imagine what happens to all the cached data during backup run when the process is reading several times more information than there is physical memory installed on the server – it’s not difficult to have a database of such size. Everything is wiped out and replaced by random “garbage” for no good reason. Since the hit ratio becomes worse as the cache is being filled with random information additional I/O occur slowing everything down even further.
There is a clear conflict between the regular database activities, which need fast response times, and doing backups which would gladly act as in all-you-can-eat bar. With a busy database server it may turn out that running a simple copy operation from MySQL data directory will result in a total disaster after MySQL stops responding to the incoming queries quick enough.
Nothing really when you are using mysqldump. You can play some tricks, but it’s mostly beyond your control.
Otherwise, when copying physical data files, in order not to saturate the I/O you can limit the rate at which data is being read. This is unfortunately not something you can do with standard Linux tools like cp, scp or tar. However for example rsync has the possibility to do that with –bwlimit=KBPS option. The problem with rsync is that it needs to build a list of files to transfer before it can take any action and this operation alone is often quite heavy on I/O and is not subject to any limits.
Some time ago we prepared a patch for tar that implements –read-rate=BytesPerSecond. In this case the advantage from using tar over rsync is that you can immediately, on the fly, create a compressed archive. For example:
|
1 |
<br>/root/backup-tools/tar --read-rate=15000000 -C /mnt/snapshot -c -z -v -f - mysql | ssh backup@storage-host /root/backup-tools/write_backup.sh<br> |
This will read /mnt/snapshot/mysql directory at 15000000 bytes/s creating a TAR/GZ archive out of it and printing it out to the standard output, which is then is redirected through the pipe over SSH to a remote host where a script reads the standard input and writes the archive into a proper location (where would we be without one-liners)
On Linux there is also a utility called ionice. It allows to affect how I/O scheduler will be dealing with I/O request coming from a certain process. Giving the backup application low class or priority will cause it won’t be getting in the way of the database work so much.
|
1 |
<br>ionice -c3 /root/backup-tools/tar ...<br>ionice -c2 -n7 /root/backup-tools/tar ...<br> |
Please cosult ionice man page for usage details, it’s really very simple to grasp and use. The restriction for this application to work is that the system must be using CFQ elevator algorithm, it does not work with others. But that is not really a problem since most modern systems run on CFQ by default and even if not, you can change in at runtime anyway. To check what is the current setting you need to query your block devices. In case of the SCSI sub-system (devices named sda, sdb, sdc, etc.) that can be done with:
|
1 |
<br># cat /sys/block/sd?/queue/scheduler<br>noop anticipatory [deadline] cfq <br>noop anticipatory [deadline] cfq <br>noop anticipatory [deadline] cfq<br> |
In order to change you will just need to write to scheduler files with the name of the new scheduler:
|
1 |
<br># for device in `ls /sys/block` ; do if [ -f /sys/block/$device/queue/scheduler ] ; then <br> echo "cfq" > /sys/block/$device/queue/scheduler ; fi ; done <br># cat /sys/block/sd?/queue/scheduler <br>noop anticipatory deadline [cfq] <br>noop anticipatory deadline [cfq] <br>noop anticipatory deadline [cfq]<br> |
That’s it, now you can enjoy experimenting with ionice.
It could be also possible to make a backup application that would not interfere with the system cache. By specifying O_DIRECT flag when opening a file, an application tells the system to bypass the cache for it during reads. This is so far only an idea since there are no tools I know of that would support it well. The problem is when accessing a file that was opened with O_DIRECT flag, the file needs to be aligned to the file system block size, so usually it means the size has to be divisible by 4096 in order to read it right. Even though it’s always the case for InnoDB tablespaces, other MySQL data files do not comply with this requirement. The trick could be used here perhaps to read the file with O_DIRECT up to the last full block and then only perform a regular cached read on the last few bytes and append them to the target file.
But even if all those precautions have been taken, there are still chances for performance problems to happen on the working instance of MySQL. Such danger may for example come from an unexpected spike in load or traffic, or even from quite expected spikes that you simply can’t do anything about. So what I thought could be done here as the next step was to constantly monitor the database status and if any problems were noticed, the monitoring would simply pause the copying. I did a simple Perl script to do just that. It works by sending signals that can either stop or resume the application that copies the data:
|
1 |
<br>use POSIX ":sys_wait_h";<br>use DBI;<br>$pid= fork();<br>if ($pid == 0)<br>{<br> exec(‘/root/backup-tools/tar --read-rate=15000000 …’);<br>}<br>$was_running= 1;<br>$is_running= 1;<br>while(1)<br>{<br> if ($is_running == 0 && $was_running == 1) { kill 19, $pid; $was_running= 0; }<br> if ($is_running == 1 && $was_running == 0) { kill 18, $pid; $was_running= 1; }<br><br><br> …<br> my $sth= $dbh->prepare('SHOW GLOBAL STATUS LIKE "Threads_connected"');<br> …<br> if ($$row[1] < 10) { $is_running= 1; }<br> if ($$row[1] > 50) { $is_running= 0; }<br> …<br> $kid = waitpid($pid, WNOHANG);<br> if ($kid > 0) { last; }<br> sleep(1);<br>}<br> |
What this script does is to check MySQL status every second and in case the number of connected threads goes above 50, it sends STOP signal to the archiving process. Whenever the number of connected threads drops down back to 9 or less, the script sends CONT signal which resumes archiving. The levels are of course different for every MySQL instance, these are just examples. The cheks can also be more sphisticated or include things like processlist information, CPU load avergages, I/O load, etc.
In this case tar is actually launched by the monitoring script directly, but that’s not really necessary. It simply needs to know PID of the process to manage and have a way to know when it ends.
This is of course just a concept of what I’m using successfully in some difficut environments and you can try building a mechanism suiting your own needs based on that.
Maciek
P.S. If you know someone who does not care about backups, please let him know this URL for our data recovery services.