October 25, 2014

Using netcat to copy MySQL Database

This is mainly a cheat sheet for me to remember. Nothing rocket science.

It often makes sense to use netcat/nc to copy MySQL database between hosts in trusted networks. It bypasses encryption overhead of SSH and depending on configuration can be significantly faster.

Also note MySQL should be down when you copy data unless you’re copying from snapshot etc.

So to copy go to the mysql data directory on both boxes; such as cd /var/lib/mysql . Make sure target directory is empty. Now on the TARGET server do nc -l 4000 | tar xvf - and on the SOURCE server do tar -cf – . | nc target_ip 4000

Also note – the port you’re using should be open in the firewall.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Antony Curtis says:

    Warning: TCP may introduce errors in your data due to inadequate CRC in packet header. Make sure to to a MD5 or SHA1 of the files at both ends before trusting that the data was transferred correctly.

  2. @Antony I was just reviewing that to make sure, but I doesn’t tar have crc so with the cmdlines that Peter posted I reckon it’d be ok.
    In many cases though the firewall hassles are greater than the ssh overhead ;-)

  3. peter says:

    Thanks Antony,

    As I mentioned this is usually used for local networks in which case the risk is relatively low. Though it is a good point.

  4. Andrew says:

    Hi,

    I think you forgot a “-p” from the netstat command on the target server. Should probably read:

    nc -l 4000 | tar xvf -

  5. Andrew says:

    I meant:
    nc -l -p 4000 | tar xvf -

  6. Przemek says:

    You need to use ‘-p’ only on debian like distros, while on redhat’s not. nc seems to be slightly different across those families.

    Sometimes when you need to limit transfer bandwidth throttle (http://klicman.org/throttle/) is extremely handful. With this little nice tool you can adjust transfer speed without stopping it !
    Also bar (http://clpbar.sourceforge.net/) is very nice :)

  7. peter says:

    Andrew,

    The RedHat/Centos version of nc works without this option and it does not work with it:

    [pz@box scripts]# nc -l -p 4000
    usage: nc [-46DdhklnrStUuvzC] [-i interval] [-p source_port]
    [-s source_ip_address] [-T ToS] [-w timeout] [-X proxy_version]
    [-x proxy_address[:port]] [hostname] [port[s]]

  8. Ken says:

    Why not rsync? You get wirespeed performance, incremental updates (e.g, interruptions) and optional checksumming. You can also set up a standard rsync config on your database machines, and enable “rsync –daemon” at will and disable it after.

    Of course, if you have secure data that’s not encrypted in-DB, then scp is a better bet, or even pre- and post-encryption if ssh is to span high latency links.

    $0.02

    kb

  9. Przemek says:

    Ken, sure, rsync is great, but if you have like 400GB of files and just want them copied ASAP via local network tar|nc way is faster and uses much less cpu. All because there is no ssl overhead.

    Besides if you need to copy db snapshot from production server and even if you limit transfer using rsync’s bwlimit option you can’t change the speed in the middle of transfer in case of the machine’s disk I/O is hot.

    You can limit disk transfer at any moment during copy process using simple throttle tool.
    Just put it between pipes like this (I suggest running it in screen)
    tar cf – . |throttle -l /tmp/throttle.ctl -m 50|bar|nc $destination_host $tcp_port
    and to limit the speed to for example to 20Mbit:
    throttle -t /tmp/throttle.ctl -m 20

  10. peter says:

    Ken,

    Rsync is a way if you often copy data and want to set things setup.
    Remember we do consulting and often just need to do things once…. we also may not have root access in many cases to install and setup things.

  11. Ken says:

    Couple of things.

    Przemek: rsync does not use SSL. Its data only passes through SSL if you use it in SSH transport mode. Daemon mode is much faster, and will trivially saturate 2xGigE. Rsync allows you to interrupt an rsync and continue where you left off, even for large files. Changing bwlimit is easy in this way.

    Peter: rsync does not require root. But it does require you set up a trivial config file, something like:

    % cat c
    [db]
    path = /path/to/db
    use chroot = false
    readonly = false
    % rsync –daemon –port=1873 –config=c
    remote % rsync -aHPv –port=1873 sourcedata remotehost::db

    (optionally -W if your destination write capacity is significantly lower than network bandwidth)

    Especially in a consulting situation, I’d rather have a four line config file in my toolbelt for a very standard command, instead of relying on A) having root access to install nc or throttle if necessary, which are far less standard, or B) having a stash of nc/throttle binaries for every possible architecture (which even these days is still pretty substantial). And the incremental nature of rsync has other flexibilities; for example, rsyncing a database while it’s running, shutting it down, and rsyncing only changed content (useful mainly in niche, low bandwidth situations).

    Again, just my two cents; I think nc is the best tool for many applications, but for file-based objects rsync is difficult to beat, IMHO.

    Ken.

  12. peter says:

    Ken,

    Yes sure. Most things can be run without root. I mean if you want to get standard /etc/init.d installation.

    I typically do not need to throttle bandwidth while copying files – the use case it applies to is getting stuff between boxes as soon as possible.
    In my experience I find netcat installed about as frequently as nc.

    But well anyway thanks for the simple file on how to simply use rsync in daemon mode. It is helpful.

  13. “Also note – the port you’re using should be open in the firewall.”

    Are you really serious? I wonder who are your intended readers that you’re mentioning such an obvious thing?

  14. Joel K. says:

    My Ubuntu 8.04 system didn’t know about bar.

    I’ve always used ‘pv’ for essentially the same function.

    pv – monitor the progress of data through a pipe

    Which appears quite similar.

  15. MattW says:

    in the utility copy vein, if you have a source host, a target host on rfc1918 address space, and an intermediate bastion host you have to ssh to in order to REACH the target host (since it doesn’t have an internet-legal ip) and you don’t have a vpn/etc, you can pipe tar through a chained ssh via:

    tar cvf – mysql_dir | ssh -A user@bastionhost.example.com “ssh -A user@rfc1918.example.com ‘cd /path/to/target/;tar xvf -‘”

  16. Rsync is great for live replication copies also

    1. rsync with database running, depending on last rsync can take a while
    2. Flush tables/cache, rsync again, won’t take nearly as long
    3. shutdown master
    4. rysnc one final time, on a 10gig db only takes a couple seconds
    4. restart master, downtime only a few seconds and you have a fresh copy to use for starting a slave back up

    Of course you need to delete the proper log files depending on your setup and how you store your log files.

  17. Przemek says:

    Instead of shutdowning master just use LVM snapshot method to create a slave or backup. This way you only need a few secs of read lock.
    If you don’t use LVM partitions I suggest to start using them (optimal separate ones for data and binlogs).

  18. As its even been reported on this blog, the LVM snapshot penalty can be extreme, on a overly busy server (and yes I understand thats a problem of its own) starting up an LVM snapshot can cause a cascading failure of the system as it struggles to keep up.

  19. Przemek says:

    1. You do it at off peak time.
    2. IMHO if your system dies only because of LVM snapshot overhead – you’re far too close to your machine’s disk I/O limits.
    3. IMO always the best solution is to have a spare slave used only for backups, some periodic heavy queries, etc.

  20. Ken says:

    RT: agreed.

    Przemek:

    1. You’ll find more and more that these days people using MySQL run HA systems (as HA as MySQL gets anyway) for international users and systems, not just a little PHP site in Dubuque. The “night-time trough” is no longer a luxury for a lot of the client base of MySQL.

    2. LVM snapshot overhead alone can be 20-50%; startups with limited capex don’t want to pay 20-50% more for their storage hardware to support a method of database backup used solely because a working solution isn’t available from the DB or engine vendor (explain /that/ to a VC). Established companies don’t want to budget in 20-50% more for their expensive SAN setups. We’d all love a few thousand spare TPS or a few 100MB/s of spare throughput, but that money should be going to what makes money, not what loses money. This also doesn’t cover the overhead of saturating the disk subsystem with the *actual copy* with tar/nc or rsync. Been there. Done both. Bought the T-shirt.

    3. Absolutely agree; the many other benefits of a slave makes the cost worthwhile.

    LVM is a very useful tool, but the fact that it is often represented as a panacea by the MySQL community (not singling you out) does a disservice to those relying on that information (who then have it blow up in their face). IMHO; apologies for the rant.

    Ken.

  21. I agree with #1 …. do NOT do this on production data without checksums.

    You WILL get data corruption.

  22. Morgan Christiansson says:

    It is possible to use a less expensive cipher with ssh which will speed things up significantly.

    For example:
    rsync -e ‘ssh -c blowfish’ …
    scp -c blowfish …

  23. James Byers says:

    If using Ken’s rsync example and you’re root on both sides, you’ll likely need:

    uid = root
    gid = root

    above

    [db]

    in the rsync config file to avoid permission problems.

  24. eaglesflyhigh says:

    Good day, i need a good ip changer and how do i get more bandwidth from my server …..deepspeed@yahoo.com

  25. eaglesflyhigh says:

    Brains and Genius,respect to you all….i read almost everything on this site but i still need someone to put me through… im of good intention to life everything i learn, im a fast learner and i’d like to meet someone who ‘d be so kind and humble to please be my teacher/master.. to teach me some few thing which i could develop later on my own we could use yahoo or msn messenger for the training session…..deepspeed_772000@yahoo.com
    looking forward to meet you. thanks
    simplee me

  26. LoadAverage says:

    You can also copy the data directory to multiple servers at once using tee:

    Each Receiver: same as in the article: nc -l 4000 | tar xvf – [or nc -l -p 4000 | tar xvf - ]
    Sender: tar -f – -c . | tee >(nc target_host_1 4000) | nc target_host_2 4000

    You can add as many additional hosts as you want.

Speak Your Mind

*