EmergencyEMERGENCY? Get 24/7 Help Now!

Logical MySQL backup tool Mydumper 0.9.1 now available

 | November 12, 2015 |  Posted In: MySQL

PREVIOUS POST
NEXT POST

Databases backupThe new Mydumper 0.9.1 version, which includes many new features and bug fixes, is now available.  You can download the code from here.

A significant change included in this version now enables Mydumper to handle all schema objects!!  So there is no longer a dependency on using mysqldump to ensure complex schemas are backed up alongside the data.

Let’s review some of the new features:

Full schema support for Mydumper/Myloader

Mydumper now takes care of backing up the schema, including Views and Merged tables. As a result, we now have these new associated options:

-d, --no-data Do not dump table data
-G, --triggers Dump triggers
-E, --events Dump events
-R, --routines Dump stored procedures and functions

These options are not enabled by default to keep backward compatibility with actual mixed solutions using Mysqldump for DDLs.

Locking reduce options

--trx-consistency-only      Transactional consistency only

You can think on this as --single-transaction for mysqldump, but still with binlog position. Obviously this position only applies to transactional tables (TokuDB included).  One of the advantages of using this option is that the global read lock is only held for the threads coordination, so it’s released as soon as the transactions are started.

GTIDs and Multisource Slave 

GTIDs are now recorded on the metadata file.  Also Mydumper is now able to detect a multisource slave (MariaDB 10.1.x) and will record all the slaves coordinates.

Myloader single database restore

Until now the only option was to copy the database files to a different directory and restore from it. However, we now have a new option available:

-s, --source-db                   Database to restore

It can be used also in combination with -B, --database to restore to a different database name.

Full list of Bug Fixes:

#1431410 innodb stats tables
#1440403 *-post and *-triggers compressed files corrupt
#1470891 functions may be needed by SP and views
#1390437 segmentation fault against Percona MySQL 5.6.15-63.0
#1446280 Segmentation fault on Debian Wheezy
#1399715 Typo in –tables-list option in manpage
#1428608 missing -K option in mydumper manpage
#1440437 myloader: wrong database name in message when -B used
#1457091 tokudb detection doesn’t work
#1481747 Unable to compile r179 WITH_BINLOG=ON (undeclared ‘bj’)
#1507574 Assertion when broken mrg tables
#841651 dump view definitions
#1485688 make compile error myloader.c:209

 

PREVIOUS POST
NEXT POST
Max Bubenick

Max Bubenick has been working with MySQL for more than 10 years. Before joining Percona's Remote DBA team in 2013 he worked as lead DBA for one of the biggest social gaming companies at that time. Also he maintains mydumper.

52 Comments

  • mydumper can run many threads in parallel, where mysqldump does each table in series. Backups and restores happen much more quickly for databases with a large number of tables.

  • Hi Gareth, there are several differences between them

    – Parallel restores
    only with mydumper you can perform parallel restores using myloader

    – Files per table
    mydumper use files per table, so you can easily do partial restores

    – Chunking
    mydumper supports chunking, which is a huge performance difference, as parallelism is by chunk instead of table. In some cases only this feature improves times by more than 50% and not only backup but restore times as well

    But still mysqlpump is new tool and Im sure it will have more features, the most important thing here is that we are seeing an effort on get the logical backup tools to the next level.

  • Max,
    Can you talk a little more about the chunking? I was always under the impression mydumper parallelized table-by-table, and that’s certainly the way it appears to be operating in the MySQL process list and in the filesystem. Is chunking something that needs to be enabled?

    I just ran a test of 0.9.1 overnight, plugged my wrapper script for it into my nightly backups instead of my standard mysqldump-based script, and it worked perfectly. But when I run it in a timed benchmark against the mysqldump script, I’m only seeing about 7 seconds speed-up in my environment, out of just over 6 minutes total. This appears to be because more than 90% of my DB data is in two very large tables (bacula.File and dspam.dspam_signature_data), so the gating factor on backup length is how long it takes to dump those two tables, and they’re ending up being dumped serially because of the order in which mydumper gets to them.

    Would it be possible to add some smarts to have an optional large-table mode whereby mydumper takes some time out at the start to calculate what the largest tables are, and if the largest tables are larger in size than, say, total database size divided by number of threads, assign those giant tables to their own threads in order to make sure they are parallelized, then divide the smaller tables first-come-first-served among the remaining threads as usual?

  • Gareth, in addition to what Max pointed out, there are some serious and poorly-documented shortcomings to mysqlpump.
    One, which is not explicitly called out anywhere but which can be found mentioned as an aside in the options documentation, is that in parallelizing mode, mysqlpump can interleave INSERTs to different tables in the single dump file. This means that doing a selective restore of just a few accidentally-erased tables from a mysqlpump dump is extremely difficult, possibly prohibitively difficult, whereas with mydumper/myloader it is a trivially simple operation.
    The other, even more serious, which does not seem to be exposed in the official documentation at all (although it is present in the mysqlpump announcement from the MySQL server team), is that mysqlpump’s multiple threads do not have a shared synchronization point for their dump operations — which means that mysqlpump dumps are *inconsistent*. This basically means that at its present state of development, mysqlpump is safe ONLY for offline dumps.

  • Hi Phil,

    Maybe this topic worths a post by it self, but to simplify mydumper has two chunking options, they are not enabled by default

    -r, –rows Try to split tables into chunks of this many rows. This option turns off –chunk-filesize

    -F, –chunk-filesize Split tables into chunks of this output file size. This value is in MB

    –rows only works over tables with an integer index for now and the logic needs to be improved yet. Anyway, with this mydumper use a thread per chunk instead of table.

    –chunk-filesize is useful when you don’t have an integer index, in this case mydumper use a thread per table but it splitting the output into multiple files by this parameter size. It won’t improve times as –rows does for backups but still helps to improve restore times as it can be done by multiple threads.

  • Hi Max,

    Thanks for mentioning this tool as well as its advantages over the regular mysqldump that we all use.

    I have some quick questions:

    – Is it stable? Can it be used on a production environment?
    – Do you guys use it for your clients?
    – Are there any known bugs/limitations?
    – Is it really free?

  • Fadi,

    – Is it stable? Can it be used on a production environment?
    Yes, you can use it on a production environment, but as with any backup solution you need to run your own tests, not only taking a backup but restoring and checksum it. Also be sure that you know how it works and how it can affect your production workload.

    – Do you guys use it for your clients?
    mydumper is the principal tool for our logical backups solution, so yes we use it.

    – Are there any known bugs/limitations?
    You can check https://bugs.launchpad.net/mydumper for bugs and limitations.

    – Is it really free?
    Yes, it is!!

  • Hi Max,

    I got the below error while taking the backup by using mydumper .i tried to get resolve it but no luck.help me in this

    /home/mysql/mydumper-0.9.1/mydumper -S /mysql/mysql.sock -u xxxx -p’xxxx’ -B dataabses -o /home/backups/

    option parsing failed: Error parsing option -o, try –help

  • Hi Narendra,

    You need to add a blank space between parameter and value. Also I would recommend to add credentials to .my.cnf rather than command line.

  • Hi Max,

    Is there any option to take multiple DB backups(more than one not all) at a time by using mydumper..
    /home/mysql/mydumper-0.9.1/mydumper -S /mysql/mysql.sock -u xxxx -p ’xxxx’ -B db1 db2 db3 -o /home/backups/
    e
    i used above one to take backup three databases . after executing this . i am seeing that its taking backup of db1 only..

  • Hi Narendra,

    -B is for single database dumps, for multiple ones you can use the regex option like

    –regex=’^((db1.|db2.|db3.))’

  • what’s the difference of “–trx-consistency-only” and “–less-locking”?
    It seems with “–less-locking”, mydumper will do a short FTWRL, looks same as “–trx-consistency-only”~

    • Hi leafonsword,

      There are two differences: consistency and FTWRL time.
      –trx-consistency-only will not care about non transactional tables consistency and will release the lock right after the threads coordination.
      –less-locking instead will release the FTWRL after looping all databases and locking with different threads the non transactional tables.

      So –less-locking FTWRL is shorter than default and –trx-consistency-only is shorter than –less-locking but you need to have all innodb, or tokudb, if you want a full consistent backup.

      • ….But –less-locking will still yield a consistent backup even if there are non-transactional tables present, correct?

  • Thanks Max ! If MySQL’s system table default engine is innodb or tokudb, I think “–trx-consistency-only” will be very helpful~
    I have another question: why could option “–use-savepoints” reduce meta lock holding time?

  • Hi leafonsword,

    to reduce MDL times you can use savepoints and rollback to them after each table is dumped, this doesn’t affects consistency but removes the MDL on the table.

  • Thanks,Max~
    Though mydumper is very strong,if it add those features , I think it will be perfect:
    1.support mysqldump’s ‘–where’ option
    2.support just load schema or data while the backup dir contains both schema and data
    3.support backup view
    4.support backup multi databases one time

    • Leafonsword, your point 3 is already trivially easy to do with mydumper, because it dumps schema and data separately. Just create a temp directory, link the schema or data files for just the schemas or individual tables you want into there, and run myloader in that directory. Shazam! Done.

      Mydumper is the best tool I have ever encountered for doing partial restores of DB data. Nothing else comes close.

  • Hi leafonsword,

    Thanks for your feedback, Im actively working on mydumper/myloader and adding new features. About your points I agree with 1 and 2 and depending on priorities we can have those soon.

    About point 3 and 4 they are already in mydumper!!! Views are dumped by default and for multi databases you can use –regex as I explained above.

    • Ha,that’s right!
      One more thing, could you put mydumper in github?Since github looks better than launchpad, and percona also has its official repository,maybe mydumper could in repository too~

  • Hi Max ,
    Do we have option to backup entire DB as a single sql file? where we can handle the backup files easily ..instead of having per table schema and data as a single sql file.

    • Narendra, that would almost certainly require major re-engineering, and would in many ways work in direct opposition to mydumper’s multi-threaded dumps and reloads. If you want a single monolithic dump file, why not just use mysqldump that already works that way? If your concern is easily moving the complete dump around, you could just tar the dump directory.

  • hi, when I use mydumper to backup my DB there is some error happen. But in another of DB server it works OK.

    [root@localhost backup]# mydumper –user backup –password backup@123 -B mysql -o ./test -c -e

    ** (mydumper:36394): CRITICAL **: Couldn’t write data to a file: No such file or directory

    ** (mydumper:36394): CRITICAL **: Could not write create database for mysql
    Segmentation fault

    [root@localhost backup]# ls test
    metadata.partial mysql.columns_priv.sql.gz mysql.db.sql.gz mysql.event.sql.gz mysql.func.sql.gz mysql-schema-create.sql.gz

    [root@localhost backup]# mydumper -V
    mydumper 0.9.1, built against MySQL 5.6.29-76.2

    [root@localhost backup]# mysql -V
    mysql Ver 14.14 Distrib 5.6.28-76.1, for Linux (x86_64) using 6.2

    [root@localhost backup]# cat /etc/centos-release
    CentOS Linux release 7.2.1511 (Core)

  • Can you have the output of mydumper be piped into the input of another program ? We want to encrypt the resultant text files but would like to do that on the fly such as piping mydumper output to input of openssl.

  • Just play with it a little and you should find it straightforward enough that you don’t need a webinar to understand it. It’s really pretty simple. Because it’s a lot smarter than mysqldump, it does not need mysqldump’s huge number of command-line options.

  • Max,
    –statement-size is how mydumper handles max_allowed_packet. Is there a technical reason why mydumper cannot simply auto-set –statement-size to accommodate global.max_allowed_packet upon initial connection to the target server?

    • Hi Phil,

      Not really, but mostly I would like to keep it safe, MySQL’s default value is 1M and if I increase it I would expect people start getting ER_NET_PACKET_TOO_LARGE when restoring. Maybe I can keep the default but add an option to auto detect it.

      • That’s why I was suggesting auto-detect, though I can see how auto-detection might cause a problem if a database is dumped from a server with, say, max_allowed_packet = 16M and then loaded onto one with max_allowed_packet = 1M. However, what will happen *now* in that case? Will it currently fail anyway?

        • Not sure what do you mean by *now* but I would guess if we do that change :). So, in that case you will be able to restore only to instances where max_allowed_packet is equal or greater than your backup source.

          So now having –statement-size defaults is 1M, even if the source server is 16M the generated statements are 1M which won’t fail even for default mysql installations. And I think these are the defaults for.

          In the other hand, this could cause issues if you have large fields, like blobs, bigger than –statement-size but I didn’t see that often.

          • Actually, the reason this came up was a customer who is getting numerous ‘Row bigger than statement-size’ warnings because their database has large numbers of huge XML strings. (Which they’re trying to strip out, but…)

            So does mydumper’s –statement-size option limit the WRITE statement size, then, not anything to do with actually retrieving the data? That is not clear from the documentation. Then perhaps auto-setting –statement-size isn’t actually the best way to handle this warning.

            If I understand your explanation correctly, the warning is actually harmless. Is that correct? If so, max_allowed_packet auto-detection probably isn’t needed at all, but perhaps an option to suppress the warning would be useful, just to cut down on log noise.

    • OK, then I’m going to just have my wrapper script manage that and forget the entire max_allowed_packet detection idea altogether.

      Thanks!

      • (I note, by the way, that this is yet another detail in which mydumper is smarter and more advanced than mysqldump.)

  • Max,
    This is a feature request.

    I’ve run into a problem using mydumper on XtraDB Cluster. The environment in question has four XtraDB nodes, the fourth a non-voting node (weight 0) which runs as a dedicated backup node, and is set to DESYNCED during DB dumps. This node is fairly often being evicted from the cluster during the backup, which causes it to declare itself non-primary and stop answering queries, which causes the backup to fail.

    Do you suppose it would be possible for mydumper to detect when it is connected to an XtraDB Cluster node (by checking to see whether the status variable wsrep_cluster_status exists), and if so, pause/suspend if wsrep_cluster_status becomes other than ‘Primary’?

  • Max,
    I have another feature request, to address a current production problem.

    We have a client who has tens of thousands of schemas and a very high churn rate on schemas. MOST nights, their DB backup reports errors because they have dropped one or more schemas in between when the backup run started and when mydumper actually got to that schema. This produces false failure reports in monitoring.

    Something like the following combination of features would be extremely useful:
    — Add a command-line option to pass to mydumper to tell it that it is OK for schemas to be missing by the time it gets to them.
    — If this option is in use, each time mydumper begins to dump tables from a new schema, check that the schema still exists, and if not, drop that entire schema and its contents from the list of objects to be dumped.
    — Change reporting of the ‘table or schema does not exist’ case by default from an error to a warning (because it’s not an error, mydumper didn’t fail; it’s just that the object had been removed from the DB by the time mydumper got to it).
    — If the “missing is OK” option is in use, “missing table/schema” is further downgraded to an informational message instead of a warning

    Of course, if you can suggest a better way to handle this…? Right now, the only way I can come up with to prevent the false dump-failure alerts is to suspend replication to the dedicated backup slave while backups are running.

  • Hi Max,

    I have installed 0.9.1, but it is showing built against mysql 5.1.73

    /usr/local/bin/mydumper -V;
    mydumper 0.9.1, built against MySQL 5.1.73

    And also when I compare the mentioned features with 0.9.0, I do not see any difference between both the versions, it looks same.

  • There is a mydumper-0.9.1 RPM built for RHEL6/Centos6/OEL6 located at http://repo.enetres.net/x86_64/mydumper-0.9.1-1.el6.x86_64.rpm. However, it is ONLY the RHEL6 compatibles. Does anyone know of a RHEL7-compatible RPM package for mydumper 0.9.1?

  • Max, I just discovered a bug in myloader.

    Capsule summary: Filename matching when using myloader -s is incorrect, causing false matches.

    Suppose you have schemas schema1 through schema9, and you run:

    myloader -eod dumps/ -s schema1

    The whole of schema1 will be restored, leaving the other schemas untouched.

    Now suppose you run the same command, but you have schemas schema1 through schema13. myloader will restore schema1, BUT will then also go on and try to load schema10, schema11, schema12, and schema13 into schema1.

    I presume the problem here is that myloader SHOULD be matching files against \.*, but is actually incorrectly matching against * (which will also match files for any schema whose name begins with ).

  • argh. Let me try that last sentence again using different delimiters:

    I presume the problem here is that myloader SHOULD be matching files against [schemaname]\.*, but is actually incorrectly matching against [schemaname]* (which will also match files for any schema whose name begins with [schemaname]).

Leave a Reply