For a long time, I’ve been thinking about the possibility of importing a single file with multiple connections. Why? Simply because we have scenarios where we end up importing a big file with a single loader thread. Well, I have good news: since the release of 0.16.3-1, we are able to do it.

There are multiple reasons why we end up with a large file in a MyDumper backup. For instance, if just one thread is used to export a large table, or just because we don’t want to use -F, or the worse case, that the table doesn’t have an integer primary key or no primary key at all.

Nowadays, it is also possible to end with a small number of big files because mydumper can export a large table, creating the same number of files as the number of threads that we configure with -t.

Take into account that MyDumper used to use the files to isolate the loader threads, which means that it used one execution thread per file:

And now, we can use multiple connections:

You can see more details about it in #1474 and #1477.

Performance improvements

To test the performance improvement, I used a 5M row sysbench table that was created with the next command:

The myloader command was the same in all the cases:

Test with a single file

To create just a single file with the 5M rows, I run this command:

The import with v0.16.1-3 used one single thread and took 65.2 seconds to complete

The new version took 28.7 seconds as it used four connections to import the table:

Test with multiple files

To create a backup with files of 200MB, I executed this command, which is going to create five files:

The v0.16.1-3 now decreased the time to 33.4 seconds as it is using the four threads:

And the new version takes 27.6 seconds:

Test with multiple files of the same size

As you can see from the previous test, there were five files whose sizes were not the same. The command that I executed to check with files with the same sizes was:

This is the only case where the v0.16.1-3 performed at the same level as the new version, as it took 28.5 seconds:

The new version is still taking 28.1 seconds:

Fragmentation

The next graphs were built with the https://github.com/jeremycole/innodb_ruby/ tool.

On the left is the new version, and on the right is the v0.16.1-3:

         

As you can see, the new version is causing a high fragmentation at the end of the process. This is expected when you insert rows with multiple threads in sequential order due to the split of the pages.

This is why it is so important to use -F when you take backups, as it will split the tables and/or the chunks into multiple pieces, which, in the end, reduces fragmentation when you import the table.

Conclusions

As we can see, the new version takes a consistent amount of time and is faster in most scenarios. It does not rely on how you take the backup to improve performance. The only drawback is that some undesirable fragmentation on the table can occur, but it can be reduced with the proper configuration when you take the backups.


Percona for MySQL offers enterprise-grade scalability and performance without traditional enterprise drawbacks. We deliver secure, tested, open source software complete with advanced features like backup, monitoring, and encryption only otherwise found in MySQL Enterprise Edition.

 

Learn Why Customers Choose Percona for MySQL

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Mark Tomandl

I love those illustrations at the bottom! I finally found the old blog post where that specific part of the innodb_ruby tool is talked about, for anyone else wanting that it’s here: Illustrating Primary Key models in InnoDB and their impact on disk usage