Replication Failure 2.5 TB DB size

  • Filter
  • Time
  • Show
Clear All
new posts

  • Replication Failure 2.5 TB DB size

    Hi all,

    I have a MySQL Percona DB with Master - Master replication setup which has 2.5 TB of data in it. Somewhere a small scale corruption happened which altered some table structure. I was able to recover the Master immediately and send it to production. Now the problem is the slave is not in synch and it is throwing error. I tried everything (skip counter, fill data etc), but I was not able to synch them. Now the only thing I can do is re setup the replication.

    For that I believe I will need a master Copy from the Master DB which is not possible for me to take because backing up a 2.5 TB DB by locking the tables is not a possible solution. The configuration of the server is

    Dual Hex core CPU,
    128 GB RAM,
    3.6 TB total disk space in RAID 10
    1.1 free space
    Fedora 15 OS

    My questions will be,

    In this scenario, what do you recommend to setup the slave replication and then make it a Master-Master setup?

    Please let me know how to go forward. I am totally struck with this thing.

    Thank you so much in advance!!


  • #2
    Are you using primarily MyISAM or InnoDB?


    • #3
      I am using InnoDB. All the tables are InnoDB.


      • #4
        You can use xtrabackup to run a non-locking backup and use it to set up the slave again. Here you have how to setup a slave in 6 steps using xtrabackup:

        http://www.percona.com/doc/percona-xtrabackup/howtos/setting _up_replication.html

        Also, maybe you can stream the backup directly to the slave using ssh or netcat, that will avoid storing it somewhere and then copying it to the destination:

        http://www.percona.com/doc/percona-xtrabackup/howtos/recipes _ibkx_stream.html


        • #5
          I am curious to know how long it took you to backup and restore the 2.T TB database with percona xtrabackup


          • #6
            It will depend on his i/o subsystem and whether he has a 1gbit or 10gbit connection. It should be possible in an hour, but could easily take much longer if i/o on the master db is heavily utilized by the master db-server itself.