September 16, 2014

Wishes for mysqldump

Dealing with dumping and recovery of large and partially corrupted database I’ve got couple of feature ideas for mysqldump or similar tool and appropriate import tool

Dump in parallel single thread dump is not efficient of course especially on systems with multiple CPUs and disks. It is lesser issue in recovery case because import takes most of the time anyway but will be handy for some backup needs.

Dump each table in its own file This is much more convenient compared to single say 100GB sql file allowing to prioritize data load and if load fails for any reason you can easily restart it. Also it allows to prioritize data load if you’ve got to load many tables which have different priority for your application.

Safe Dump Dumping corrupted Innodb tables you will have some of the them crashing Innodb which breaks mysqldump process. It would be great for load script to record in the logs such table was not dumped successfully, wait for MySQL to complete restart and continue going with other tables.

Parallel restore This is absolutely required if time is the issue as serious systems may perform much better in such case.

As usually when need arises there is no time to implement solid solution which will work for a lot of people so we get by with few quick and dirty shell scripts to do this.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Nav says:

    Does mysqlimport using user-threads speed up import for a single large table?

    Or

    Is it only beneficial while importing multiple tables [files]?

    Thanks
    Nav

  2. I think all of these ideas could be implemented as a wrapper for standard mysqldump:
    1) Script reads tables list and spawns N mysqldump processes, when one process finishes its work, then script starts dumping next table… (I think it could be really nice parallel dump and file-per-table implementation)

    2) Safe dump… harder to implement, but I think it is possible.

    3) Parallel restore: The same engine, but now it would feed dump files to mysql processes.

  3. Shlomo Priymak says:

    Does anyone have such a script? It could be usefull as a starting point.
    In my opinion mysql should have those features added though.

  4. Brian Aker says:

    Hi!

    Dump in parallel – Good idea, but hard to implement currently unless you can handle the fact that you would have to lock the database.

    Dump each table in its own file – We have that in tab format.

    Parallel restore – 5.1′s import has this ability (and can be used with any version of MySQL).

    Cheers,
    -Brian

  5. You can already dump in parallel… just use a WHERE clause and divide the PK space in 1/2….

    Parallel restore is a bit harder….. You’d DEF see a boost if you did it on both a master AND the slave at the same.

    The #1 wish *I* wanted is the ability to write slave replication values to a temp file…. we hacked a version of our mysqldump to do this and I need to release a patch.

    Speaking of which…. we’ve implemented a mysqlslavesync command which takes a source MySQL server as the command line and syncs the local slave machine with its snapshot and puts the machine into production as a replicated slave.

    All with one command line :)

    It uses SSH agent to auth on the remote end…. I need to release that too :)

  6. peter says:

    Thanks Brian,

    Parallel Dump – Right. However in some cases such as recovery or if you want to take dump/restore path for the upgrade to new MySQL Version you do not care about the lock, you have exclusive access to MySQL Server anyway- you just want it fast.

    Tab format gets close but not exactly, because it only works with single database so wrapper is needed for multiple databases plus this means getting consistent backup for all database is hard.

    Thanks for parallel import note.

  7. peter says:

    Kevin,

    Well, Parallel dump for the single table would also be nice but at least easy way to deal with multiple tables would be nice.

    Regarding slave information I agree with you – the fact this info is hidden inside monster sql-dump file makes it rather inconvenient.

    mysqlslavesync – cool. I however prefer scripted binary copy in such case for example with LVM. It of course requires slave down but when your’re reloading data via mysqldump you will have some incomplete tables anyway.

    MMM tool has such script as the part of it.

  8. Hey …..

    WRT mysqlslavesync…

    I agree. Binary copy is nice.

    We have two internal mechanims now. ‘dump’ and ‘hotcopy’ .. one uses mysqldump and the other uses mysqlhotcopy….

    Adding LVM snapshot and/or ZFS snapshots would be nice too. I think the script will be easy enough to extend.

    I’ll put it up on google code or code.tailrank.com tomorrow.

    Kevin

  9. Larry Damman says:

    While we’re requesting features, I have one that I’d love to see implemented. It would be similar to –master-data=1 except it would get the Exec_Master_Log_Pos from the “SHOW SLAVE STATUS” command. This way you could (easily) create a sibling of the dumped db instead of a child. This would come in handy when you take your backups/snapshots off of a slave in order to keep the master running smoothly.

  10. peter says:

    Larry,

    Absolutely. This is why in MMM backup tools we store both SHOW SLAVE STATUS and SHOW MASTER STATUS info so you can when use it to restore master and do point in time recovery using binary log or restore slave and get it going.

  11. fr3nd says:

    I wrote a quick and dirty wrapper in python that launches many threads with mysqldump. Each thread stores the result in memory and waits for the preceding one. That’s the reason why it could be so memory consuming with databases with large tables.

    You can find it here: http://www.fr3nd.net/projects/mysqlpdump

    It’s pretty raw at the moment, but it can be used as skeleton to build something more complex.

    Any comments will be appreciated :)

  12. Brian Aker says:

    Hi!

    I’ve been thinking about a –clone option for a bit now… allow you to get the slave options plus take the name of a machine to feed the data to. Bypass the need to pump data via:
    mysqldump [args] | mysql –host –port

    Which is something I do all of the time to make new slaves. If I get a spare minute I’ll look at your idea on parallel dump with locks. If I do it though, I would be interested in it via a tab dump.

    Cheers,
    -Brian

  13. peter says:

    Thank Brian,

    Really I like tab format a lot. LOAD DATA INFILE should be easier to optimize for storage engines internally if they like to. Right now I think it is underused because you need to process each database separately.

  14. flupps says:

    Brian, Larry:

    There’s been a patch around for this since 2005, unfortunately never got pushed into the mysqldump…
    http://bugs.mysql.com/file.php?id=959

  15. fr3nd says:

    Hey,

    I’ve released a new version of mysqlpdump whith some important improvements:

    * Store dumps to files directly instead to stdout
    * Can compress files
    * Dump each table in its own file
    * Can pass parameters directly to mysqldump

    http://www.fr3nd.net/projects/mysqlpdump

    Next thing I want to implement is to store all the replication data to a separate .sql file.

    Any suggestion?

  16. Chip released compact_innodb.py many months ago. Despite the _compact_innodb_, it should work with more storage engines. It does parallel load and restore by forking mysqldump and mysql processes. The project page for the code is http://code.google.com/p/google-mysql-tools.

  17. Thomas says:

    3. I wrote a simple perl script that does just that… I only tested with 16 threads so far (on a 16-proc server) and IO seek seemed to me a big problem (the restore was probably between 50 and 200 times slower! – I just aborted it). It’s not a production RAID (only for local C drive, “value” kind of hardware…) and I may do more tests on a real RAID.

    My script is a bit crappy; it buffers tables in memory then send them in one block to child processes, and block all threads on db changes. I know how I can make it better by using queues and non-blocking I/O but I’m unlikely to to this anytime soon due to time constraints. It’s fairly simple to do with Event::Lib though.

    Also I’m not sure if the specs for dumps are published… I rely solely on comments in the dump and on the fact that global parameters are only present at the start of it. Also I based my work on a specific database dump; If there’s a spec somewhere it could help.

Speak Your Mind

*