Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

TokuDB Hot Backup – Part 2

September 19, 2013

Author

Christian.Rober

MySQL

Share this Post:

In my last post, I discussed the existing backup solutions for MySQL. At the end I briefly discussed why the backup solutions for InnoDB do not apply to TokuDB. Now I’m going to outline the backup solution we created. Our solution works for both TokuDB and InnoDB. It also has no knowledge of the log files and does not require any changes to either storage engine. In fact, the library could be used with almost any process; it has no knowledge of what types of files are being backed up.

Shims

Tokutek’s Hot Backup is essentially a shim between the mysqld process and the operating system (Linux only, at this point.) It is a separately compiled C++ library that simply gets linked into the mysqld application at the end of the respective build process. We ship this library with our own enterprise versions of both MySQL and MariaDB.

The magic of this shim is that it intercepts all relevant file system calls made to the Linux kernel by mysqld . Any file that is opened, read from, written-to, renamed, unlinked and/or closed by MySQL is intercepted by our hot backup library. Directory creation and removal are also intercepted. This is all transparent to MySQL. Again, no changes to the core MySQL system were required to intercept these system calls, just the addition of the library at link time.

I should note, at this point, that the library does expose a few C-Style functions and callbacks. We added the appropriate plumbing and syntax to allow users to call this API and interact with the library. Users can execute commands in SQL that initiate a backup, throttle (slow down) a backup, and get backup progress/error reporting. However, the key idea is the same: all of the changes made to database files pass through the backup library, without any configuration required by the user.

The fact that our backup library can see every file operation since the epoch of the process helps achieve consistency between the backup copy and the original database files, even while there are read and write workloads occurring. We keep track of every open file and the different file offsets used, for reads and writes, by mysqld. To do this, we create some state, in memory, that mirrors the same state in the file system.

Whenever mysqld makes a file system call, it actually calls our backup library instead. Our library eventually makes the call to the actual file system on behalf of mysqld. During this system call interception we create our own in-memory mirror of the file system state. This includes the full path to the original file, the integer file descriptor associated with the respective file, and that file descriptor’s offset.

As mysqld reads and writes to the file, which in this case is usually a file representing a database, we update the file offset for that file. This occurs even if no backup is in progress. Once a backup is initiated, we begin copying each file.

Locks

As seen in our last blog, we need to prevent races between our copy and mysqld’s writes to the same file. We do this by locking each segment of the file as we copy it. Most writes mysqld performs will not block on this lock. In the rare case that mysqld is trying to write to the same segment that is being copied, one will wait for the other to finish.

If an UPDATE mysqld path wins the race, the backup library will block till the update/write is done. Once mysqld finishes the write, the library will copy the newly altered data.

The more interesting case is when our backup library wins the race. Once the library finishes copying the data, it will release the lock, allowing mysqld to alter that data. In this case the backup copy of the data will be stale, it won’t match the original. The copy of that data segment on the backup will not have the most recent change.

The solution for this situation is simple. During a backup, we apply any changes mysqld makes to both the original file AND the backup file. This occurs even if the backup library has yet to copy the respective file. This does require every write to be written to two different files, but again, this does not occur if there is no backup in progress.

This diagram shows what the library does when it wins the race and must apply the changes from myqld. This occurs after the library relinquishes the data segment lock to mysld.

Results

At the end of a backup we have a copy of all our original database and log files. This backup data can be used to start a new instance of TokuDB. When you start TokuDB with the backup files, it performs recovery, using the log to remove any uncommitted transactions. These transactions must be undone, similar to crash recovery, because backup does not end on a transactional boundary.

Remember, our backup library has no awareness of the log or how it relates to the database files. This is actually OK. We have enough information to recover the database to a state VERY close to the time the backup finished (this time is reported in the log in the original mysqld’s error log.) Users will still end up with a consistent and correct database. The caveat is that any active transactions that failed to commit, precisely when hot backup finishes, will be undone upon recovery.

Users are now able to take hot backups of an active system running TokuDB, with no downtime. The backup library does not use much memory, and does not spoil the cache used for the tables. The backup process can also be throttled so that it copies the files at a slower rate. This throttling is especially useful if there are frequent disk accesses, such as when the data being processed does not fit in main memory.

Next week I will be showing how we integrated the hot backup library into our other product, TokuMX. It offers the same feature set and is not only useful for taking backups, but also to seed new instances for an existing replica set without any downtime.

0 0 votes

Article Rating

8 Comments

Oldest

Newest Most Voted

Shlomi Noach

12 years ago

This sounds like a general-purpose file snapshot that can be attached to any running linux process.
It’s worth noting this kind of solution works well for transactional engines, but not for MyISAM; there is no escape from FLUSH TABLES when you deal with MyISAM — and MySQL’s system tables are MyISAM.

Justin Swanhart

12 years ago

Shlomi,

Assuming it intercepts read() and write() syscalls, etc, then it will work for MyISAM. The OS is free to buffer the results of read() and write() calls but the SHIM doesn’t have to do so.

The backup would end up being more consistent than the original in the case of a crash. The index, could, however be out of sync, because there is not log between the .MYI and .MYD file. So myisamchk would still be required.

Justin Swanhart

12 years ago

Does it work with async io? InnoDB uses libaio in 5.5+

Justin Swanhart

12 years ago

For InnoDB, I think libaio should not matter. Doublewrite buffer is written sync and the log itself is not async (it may be direct_io, but that should not be a problem). As long as the log data and doublewrite buffer is there, any lost or partially written async blocks will be corrected during recovery.

gggeek

12 years ago

I think you mean “transparent” when you say “opaque” 🙂

Christian.Rober

12 years ago

Reply to gggeek

Thanks for the catch!

Ralf Engler

10 years ago

“At the end of a backup we have a copy of all our original database and log files. This backup data can be used to start a new instance of TokuDB”.

So far as I understood it’s not possible to start a TokuDB instance on a differing data directory than the original backup source.

In multi-terabyte environments it would be useful to store the current backup on the same machine (original on SSD, backup on HDD). Is there any way to start a second MySQL/TokuDB instance with the backup data directory (and running on another port off course)?

david molefe

9 years ago

Hi Guys

I need to know the following please
1 if table compression does not affect tokudb hot backup
2 if the storage of backup should be the same size as original,
3. when restoring how much storage will i need same as original or x3

Your answers are highly appreciated