Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Logical MySQL backup tool Mydumper 0.9.1 now available

November 12, 2015

Author

Max Bubenick

MySQL

Share this Post:

Databases backup The new Mydumper 0.9.1 version, which includes many new features and bug fixes, is now available. You can download the code from here.

A significant change included in this version now enables Mydumper to handle all schema objects!! So there is no longer a dependency on using mysqldump to ensure complex schemas are backed up alongside the data.

Let’s review some of the new features:

Full schema support for Mydumper/Myloader

Mydumper now takes care of backing up the schema, including Views and Merged tables. As a result, we now have these new associated options:

-d, --no-data Do not dump table data -G, --triggers Dump triggers -E, --events Dump events -R, --routines Dump stored procedures and functions

These options are not enabled by default to keep backward compatibility with actual mixed solutions using Mysqldump for DDLs.

Locking reduce options

--trx-consistency-only Transactional consistency only

You can think on this as --single-transaction for mysqldump, but still with binlog position. Obviously this position only applies to transactional tables (TokuDB included). One of the advantages of using this option is that the global read lock is only held for the threads coordination, so it’s released as soon as the transactions are started.

GTIDs and Multisource Slave

GTIDs are now recorded on the metadata file. Also Mydumper is now able to detect a multisource slave (MariaDB 10.1.x) and will record all the slaves coordinates.

Myloader single database restore

Until now the only option was to copy the database files to a different directory and restore from it. However, we now have a new option available:

-s, --source-db Database to restore

It can be used also in combination with -B, --database to restore to a different database name.

Full list of Bug Fixes:

#1431410 innodb stats tables
#1440403 *-post and *-triggers compressed files corrupt
#1470891 functions may be needed by SP and views
#1390437 segmentation fault against Percona MySQL 5.6.15-63.0
#1446280 Segmentation fault on Debian Wheezy
#1399715 Typo in –tables-list option in manpage
#1428608 missing -K option in mydumper manpage
#1440437 myloader: wrong database name in message when -B used
#1457091 tokudb detection doesn’t work
#1481747 Unable to compile r179 WITH_BINLOG=ON (undeclared ‘bj’)
#1507574 Assertion when broken mrg tables
#841651 dump view definitions
#1485688 make compile error myloader.c:209

0 0 votes

Article Rating

56 Comments

Oldest

Newest Most Voted

Sheyda Amini

10 years ago

What is the advantage of using this tool to mysqldump?

SuperQ

10 years ago

mydumper can run many threads in parallel, where mysqldump does each table in series. Backups and restores happen much more quickly for databases with a large number of tables.

Gareth Smith

10 years ago

Similarly, what is the advantage of using this tool over *mysqlpump* – the new parallel backup tool in MySQL 5.7 ?

Author

Max Bubenick

10 years ago

Hi Gareth, there are several differences between them

– Parallel restores
only with mydumper you can perform parallel restores using myloader

– Files per table
mydumper use files per table, so you can easily do partial restores

– Chunking
mydumper supports chunking, which is a huge performance difference, as parallelism is by chunk instead of table. In some cases only this feature improves times by more than 50% and not only backup but restore times as well

But still mysqlpump is new tool and Im sure it will have more features, the most important thing here is that we are seeing an effort on get the logical backup tools to the next level.

Phil Stracchino

10 years ago

Max,
Can you talk a little more about the chunking? I was always under the impression mydumper parallelized table-by-table, and that’s certainly the way it appears to be operating in the MySQL process list and in the filesystem. Is chunking something that needs to be enabled?

I just ran a test of 0.9.1 overnight, plugged my wrapper script for it into my nightly backups instead of my standard mysqldump-based script, and it worked perfectly. But when I run it in a timed benchmark against the mysqldump script, I’m only seeing about 7 seconds speed-up in my environment, out of just over 6 minutes total. This appears to be because more than 90% of my DB data is in two very large tables (bacula.File and dspam.dspam_signature_data), so the gating factor on backup length is how long it takes to dump those two tables, and they’re ending up being dumped serially because of the order in which mydumper gets to them.

Would it be possible to add some smarts to have an optional large-table mode whereby mydumper takes some time out at the start to calculate what the largest tables are, and if the largest tables are larger in size than, say, total database size divided by number of threads, assign those giant tables to their own threads in order to make sure they are parallelized, then divide the smaller tables first-come-first-served among the remaining threads as usual?

Phil Stracchino

10 years ago

Gareth, in addition to what Max pointed out, there are some serious and poorly-documented shortcomings to mysqlpump.
One, which is not explicitly called out anywhere but which can be found mentioned as an aside in the options documentation, is that in parallelizing mode, mysqlpump can interleave INSERTs to different tables in the single dump file. This means that doing a selective restore of just a few accidentally-erased tables from a mysqlpump dump is extremely difficult, possibly prohibitively difficult, whereas with mydumper/myloader it is a trivially simple operation.
The other, even more serious, which does not seem to be exposed in the official documentation at all (although it is present in the mysqlpump announcement from the MySQL server team), is that mysqlpump’s multiple threads do not have a shared synchronization point for their dump operations — which means that mysqlpump dumps are *inconsistent*. This basically means that at its present state of development, mysqlpump is safe ONLY for offline dumps.

Author

Max Bubenick

10 years ago

Hi Phil,

Maybe this topic worths a post by it self, but to simplify mydumper has two chunking options, they are not enabled by default

-r, –rows Try to split tables into chunks of this many rows. This option turns off –chunk-filesize

-F, –chunk-filesize Split tables into chunks of this output file size. This value is in MB

–rows only works over tables with an integer index for now and the logic needs to be improved yet. Anyway, with this mydumper use a thread per chunk instead of table.

–chunk-filesize is useful when you don’t have an integer index, in this case mydumper use a thread per table but it splitting the output into multiple files by this parameter size. It won’t improve times as –rows does for backups but still helps to improve restore times as it can be done by multiple threads.

George

10 years ago

Thanks for the wonderful and continued work on mydumper !

George

10 years ago

oh one bug i found is version number is reported incorrectly 0.9.1 vs 0.9.0

mydumper -V
mydumper 0.9.0, built against MySQL 10.0.22-MariaDB

Gareth Smith

10 years ago

Thanks for the explanation Phil. As you note, inconsistent backups are pretty dangerous.

Fadi El-Eter (itoctopus)

10 years ago

Hi Max,

Thanks for mentioning this tool as well as its advantages over the regular mysqldump that we all use.

I have some quick questions:

– Is it stable? Can it be used on a production environment?
– Do you guys use it for your clients?
– Are there any known bugs/limitations?
– Is it really free?

Author

Max Bubenick

10 years ago

Fadi,

– Is it stable? Can it be used on a production environment?
Yes, you can use it on a production environment, but as with any backup solution you need to run your own tests, not only taking a backup but restoring and checksum it. Also be sure that you know how it works and how it can affect your production workload.

– Do you guys use it for your clients?
mydumper is the principal tool for our logical backups solution, so yes we use it.

– Are there any known bugs/limitations?
You can check https://bugs.launchpad.net/mydumper for bugs and limitations.

– Is it really free?
Yes, it is!!

Hung

10 years ago

Hi Max,
Is this tool a online dumper? (like InnoExtraBackup)

Narendra

10 years ago

Hi Max,

I got the below error while taking the backup by using mydumper .i tried to get resolve it but no luck.help me in this

/home/mysql/mydumper-0.9.1/mydumper -S /mysql/mysql.sock -u xxxx -p’xxxx’ -B dataabses -o /home/backups/

option parsing failed: Error parsing option -o, try –help

Author

Max Bubenick

10 years ago

Hi Narendra,

You need to add a blank space between parameter and value. Also I would recommend to add credentials to .my.cnf rather than command line.

Narendra

10 years ago

Cool. Thanks Max

Narendra

10 years ago

Hi Max,

Is there any option to take multiple DB backups(more than one not all) at a time by using mydumper..
/home/mysql/mydumper-0.9.1/mydumper -S /mysql/mysql.sock -u xxxx -p ’xxxx’ -B db1 db2 db3 -o /home/backups/
e
i used above one to take backup three databases . after executing this . i am seeing that its taking backup of db1 only..

Author

Max Bubenick

10 years ago

Hi Narendra,

-B is for single database dumps, for multiple ones you can use the regex option like

–regex=’^((db1.|db2.|db3.))’

leafonsword

10 years ago

what’s the difference of “–trx-consistency-only” and “–less-locking”?
It seems with “–less-locking”, mydumper will do a short FTWRL, looks same as “–trx-consistency-only”~

Author

Max Bubenick

10 years ago

Reply to leafonsword

Hi leafonsword,

There are two differences: consistency and FTWRL time.
–trx-consistency-only will not care about non transactional tables consistency and will release the lock right after the threads coordination.
–less-locking instead will release the FTWRL after looping all databases and locking with different threads the non transactional tables.

So –less-locking FTWRL is shorter than default and –trx-consistency-only is shorter than –less-locking but you need to have all innodb, or tokudb, if you want a full consistent backup.

Phil Stracchino

10 years ago

Reply to Max Bubenick

….But –less-locking will still yield a consistent backup even if there are non-transactional tables present, correct?

Author

Max Bubenick

10 years ago

Correct Phil

leafonsword

10 years ago

Thanks Max ! If MySQL’s system table default engine is innodb or tokudb, I think “–trx-consistency-only” will be very helpful~
I have another question: why could option “–use-savepoints” reduce meta lock holding time?

Author

Max Bubenick

10 years ago

Hi leafonsword,

to reduce MDL times you can use savepoints and rollback to them after each table is dumped, this doesn’t affects consistency but removes the MDL on the table.

leafonsword

10 years ago

Thanks，Max~
Though mydumper is very strong,if it add those features , I think it will be perfect:
1.support mysqldump’s ‘–where’ option
2.support just load schema or data while the backup dir contains both schema and data
3.support backup view
4.support backup multi databases one time

Phil Stracchino

10 years ago

Reply to leafonsword

Leafonsword, your point 3 is already trivially easy to do with mydumper, because it dumps schema and data separately. Just create a temp directory, link the schema or data files for just the schemas or individual tables you want into there, and run myloader in that directory. Shazam! Done.

Mydumper is the best tool I have ever encountered for doing partial restores of DB data. Nothing else comes close.

leafonsword

10 years ago

Reply to Phil Stracchino

Hi,Phil:
If I just want to dump some databases,isn’t dump all databases then filer a waste?

Phil Stracchino

10 years ago

Reply to leafonsword

Leafonsword, I meant point 2 not point 3. Sorry. Partial restores are easy.

Max already addressed your other points.

leafonsword

10 years ago

Reply to Phil Stracchino

Hi phil,I have used this method before,but I think it’s a little trivial,if myloader could use option control,it’s more convient~

Author

Max Bubenick

10 years ago

Hi leafonsword,

Thanks for your feedback, Im actively working on mydumper/myloader and adding new features. About your points I agree with 1 and 2 and depending on priorities we can have those soon.

About point 3 and 4 they are already in mydumper!!! Views are dumped by default and for multi databases you can use –regex as I explained above.

leafonsword

10 years ago

Reply to Max Bubenick

Ha,that’s right!
One more thing, could you put mydumper in github?Since github looks better than launchpad, and percona also has its official repository,maybe mydumper could in repository too~

leafonsword

10 years ago

After restoring using myloader, does it need to “flush privileges”?

Narendra

10 years ago

Hi Max ,
Do we have option to backup entire DB as a single sql file? where we can handle the backup files easily ..instead of having per table schema and data as a single sql file.

Phil Stracchino

10 years ago

Reply to Narendra

Narendra, that would almost certainly require major re-engineering, and would in many ways work in direct opposition to mydumper’s multi-threaded dumps and reloads. If you want a single monolithic dump file, why not just use mysqldump that already works that way? If your concern is easily moving the complete dump around, you could just tar the dump directory.

彭

10 years ago

hi, when I use mydumper to backup my DB there is some error happen. But in another of DB server it works OK.

[root@localhost backup]# mydumper –user backup –password backup@123 -B mysql -o ./test -c -e

** (mydumper:36394): CRITICAL **: Couldn’t write data to a file: No such file or directory

** (mydumper:36394): CRITICAL **: Could not write create database for mysql
Segmentation fault

[root@localhost backup]# ls test
metadata.partial mysql.columns_priv.sql.gz mysql.db.sql.gz mysql.event.sql.gz mysql.func.sql.gz mysql-schema-create.sql.gz

[root@localhost backup]# mydumper -V
mydumper 0.9.1, built against MySQL 5.6.29-76.2

[root@localhost backup]# mysql -V
mysql Ver 14.14 Distrib 5.6.28-76.1, for Linux (x86_64) using 6.2

[root@localhost backup]# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)

Rick Jans

10 years ago

Can you have the output of mydumper be piped into the input of another program ? We want to encrypt the resultant text files but would like to do that on the fly such as piping mydumper output to input of openssl.

rsterbal

9 years ago

Do you offer a webinar on how to use the product?

Phil Stracchino

9 years ago

Just play with it a little and you should find it straightforward enough that you don’t need a webinar to understand it. It’s really pretty simple. Because it’s a lot smarter than mysqldump, it does not need mysqldump’s huge number of command-line options.

Phil Stracchino

9 years ago

Max,
–statement-size is how mydumper handles max_allowed_packet. Is there a technical reason why mydumper cannot simply auto-set –statement-size to accommodate global.max_allowed_packet upon initial connection to the target server?

Author

Max Bubenick

9 years ago

Reply to Phil Stracchino

Hi Phil,

Not really, but mostly I would like to keep it safe, MySQL’s default value is 1M and if I increase it I would expect people start getting ER_NET_PACKET_TOO_LARGE when restoring. Maybe I can keep the default but add an option to auto detect it.

Phil Stracchino

9 years ago

Reply to Max Bubenick

That’s why I was suggesting auto-detect, though I can see how auto-detection might cause a problem if a database is dumped from a server with, say, max_allowed_packet = 16M and then loaded onto one with max_allowed_packet = 1M. However, what will happen *now* in that case? Will it currently fail anyway?

Author

Max Bubenick

9 years ago

Reply to Phil Stracchino

Not sure what do you mean by *now* but I would guess if we do that change :). So, in that case you will be able to restore only to instances where max_allowed_packet is equal or greater than your backup source.

So now having –statement-size defaults is 1M, even if the source server is 16M the generated statements are 1M which won’t fail even for default mysql installations. And I think these are the defaults for.

In the other hand, this could cause issues if you have large fields, like blobs, bigger than –statement-size but I didn’t see that often.

Phil Stracchino

9 years ago

Reply to Max Bubenick

Actually, the reason this came up was a customer who is getting numerous ‘Row bigger than statement-size’ warnings because their database has large numbers of huge XML strings. (Which they’re trying to strip out, but…)

So does mydumper’s –statement-size option limit the WRITE statement size, then, not anything to do with actually retrieving the data? That is not clear from the documentation. Then perhaps auto-setting –statement-size isn’t actually the best way to handle this warning.

If I understand your explanation correctly, the warning is actually harmless. Is that correct? If so, max_allowed_packet auto-detection probably isn’t needed at all, but perhaps an option to suppress the warning would be useful, just to cut down on log noise.

Author

Max Bubenick

9 years ago

You take it right, the warning is harmless as the backup is still consistent. To not get warnings you can set verbose to error (1) lvl only.

Phil Stracchino

9 years ago

Reply to Max Bubenick

OK, then I’m going to just have my wrapper script manage that and forget the entire max_allowed_packet detection idea altogether.

Thanks!

Phil Stracchino

9 years ago

Reply to Phil Stracchino

(I note, by the way, that this is yet another detail in which mydumper is smarter and more advanced than mysqldump.)

Phil Stracchino

9 years ago

Max,
This is a feature request.

I’ve run into a problem using mydumper on XtraDB Cluster. The environment in question has four XtraDB nodes, the fourth a non-voting node (weight 0) which runs as a dedicated backup node, and is set to DESYNCED during DB dumps. This node is fairly often being evicted from the cluster during the backup, which causes it to declare itself non-primary and stop answering queries, which causes the backup to fail.

Do you suppose it would be possible for mydumper to detect when it is connected to an XtraDB Cluster node (by checking to see whether the status variable wsrep_cluster_status exists), and if so, pause/suspend if wsrep_cluster_status becomes other than ‘Primary’?

Phil Stracchino

9 years ago

Max,
I have another feature request, to address a current production problem.

We have a client who has tens of thousands of schemas and a very high churn rate on schemas. MOST nights, their DB backup reports errors because they have dropped one or more schemas in between when the backup run started and when mydumper actually got to that schema. This produces false failure reports in monitoring.

Something like the following combination of features would be extremely useful:
— Add a command-line option to pass to mydumper to tell it that it is OK for schemas to be missing by the time it gets to them.
— If this option is in use, each time mydumper begins to dump tables from a new schema, check that the schema still exists, and if not, drop that entire schema and its contents from the list of objects to be dumped.
— Change reporting of the ‘table or schema does not exist’ case by default from an error to a warning (because it’s not an error, mydumper didn’t fail; it’s just that the object had been removed from the DB by the time mydumper got to it).
— If the “missing is OK” option is in use, “missing table/schema” is further downgraded to an informational message instead of a warning

Of course, if you can suggest a better way to handle this…? Right now, the only way I can come up with to prevent the false dump-failure alerts is to suspend replication to the dedicated backup slave while backups are running.

Kumar

9 years ago

Hi Max,

I have installed 0.9.1, but it is showing built against mysql 5.1.73

/usr/local/bin/mydumper -V;
mydumper 0.9.1, built against MySQL 5.1.73

And also when I compare the mentioned features with 0.9.0, I do not see any difference between both the versions, it looks same.

Phil Stracchino

9 years ago

There is a mydumper-0.9.1 RPM built for RHEL6/Centos6/OEL6 located at http://repo.enetres.net/x86_64/mydumper-0.9.1-1.el6.x86_64.rpm. However, it is ONLY the RHEL6 compatibles. Does anyone know of a RHEL7-compatible RPM package for mydumper 0.9.1?

Phil Stracchino

9 years ago

Max, I just discovered a bug in myloader.

Capsule summary: Filename matching when using myloader -s is incorrect, causing false matches.

Suppose you have schemas schema1 through schema9, and you run:

myloader -eod dumps/ -s schema1

The whole of schema1 will be restored, leaving the other schemas untouched.

Now suppose you run the same command, but you have schemas schema1 through schema13. myloader will restore schema1, BUT will then also go on and try to load schema10, schema11, schema12, and schema13 into schema1.

I presume the problem here is that myloader SHOULD be matching files against .*, but is actually incorrectly matching against * (which will also match files for any schema whose name begins with ).

Phil Stracchino

9 years ago

argh. Let me try that last sentence again using different delimiters:

I presume the problem here is that myloader SHOULD be matching files against [schemaname].*, but is actually incorrectly matching against [schemaname]* (which will also match files for any schema whose name begins with [schemaname]).

Phil Stracchino

8 years ago

Max,
I just completed a large (several hundred gigabytes) data load into an AWS Aurora RDS instance. The data was all apparently, as far as I can see, loaded without errors. However, at completion, myloader emitted the following error messages:

bp-bastion.CUSTOMER.com:root:/mnt/decomp:18 # time myloader -oed db2.CUSTOMER.com-20180213/

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table (null)

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table userLogins

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table DC_numbers

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table assignments_periods

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table teachers

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table user

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table students

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table students_activities_periods

** (myloader:1732): CRITICAL **: Error switching to database CUSTOMER whilst restoring table periods_students

real 189m32.755s
user 18m41.804s
sys 1m30.280s

The database exists and as far as we can tell, all tables were fully loaded.

Do you have any explanation of what these errors mean? The ‘table (null)’ message is particularly troubling.

Author

Max Bubenick

8 years ago

Hi Phil, thank you for your feedback!

Please feel free to open bugs on github https://github.com/maxbube/mydumper/issues.

About that specific errors, that is returned when myloader issue an `USE db;` in this case `USE CUSTOMER`. Will need to figure out why that could be failing.

The (null) is odd, I guess could be related to a table name which is not being parsed correctly to split between db.table.sql. If you could report a bug with at least the tables names would be great and I can take a look.