Announcement

Announcement Module
Collapse
No announcement yet.

Backup stopped working !! Slave_SQL_Running: No

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Backup stopped working !! Slave_SQL_Running: No

    Hello everyone,

    My latest issues are just piling up and i am not able to get them all sorted

    To add to my chaos, now my backup server too is giving me issues.

    When i run the command :-

    Code:
    TheSlave|mysql> SHOW SLAVE STATUS \G
             ...
             Slave_IO_Running: Yes
             Slave_SQL_Running: No
    see that "Slave_SQL_Running" is "NO" !!

    i tried to stop and start the replication service, but in vein

    My log files are throwing up this error :-

    Code:
    tail -f /var/log/messages
    
    Oct 8 13:32:37 DatabackupSRV xinetd[3396]: EXIT: nrpe status=0 pid=3295 duration=0(sec)
    Oct 8 13:34:19 DatabackupSRV xinetd[3396]: START: nrpe pid=3300 from=10.222.32.22
    Oct 8 13:34:19 DatabackupSRV xinetd[3396]: EXIT: nrpe status=0 pid=3300 duration=0(sec)
    Oct 8 13:34:29 DatabackupSRV xinetd[3396]: START: nrpe pid=3305 from=10.222.32.22
    Oct 8 13:34:29 DatabackupSRV xinetd[3396]: EXIT: nrpe status=0 pid=3305 duration=0(sec)
    It was working all this while, But i guess the recent start / stop of mysql services and the server of the master is causing this issue.

    Any suggestions to rectify this would be of great help.

    Thank you

  • #2
    Thought for a moment and decided why not post the whole output of the command : SHOW SLAVE STATUS \G

    Code:
    mysql> SHOW SLAVE STATUS \G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 10.222.1.218
                      Master_User: root
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: newcrmdb1-bin.000061
              Read_Master_Log_Pos: 315098143
                   Relay_Log_File: DatabackupSRV-relay-bin.000088
                    Relay_Log_Pos: 667796113
            Relay_Master_Log_File: newcrmdb1-bin.000054
                 Slave_IO_Running: Yes
                Slave_SQL_Running: No
                  Replicate_Do_DB:
              Replicate_Ignore_DB:
               Replicate_Do_Table:
           Replicate_Ignore_Table:
          Replicate_Wild_Do_Table:
      Replicate_Wild_Ignore_Table:
                       Last_Errno: 1146
                       Last_Error: Error 'Table 'asteriskcdr.bpleadcf' doesn't exist' on query. Default database: '
    newcrmdb'. Query: 'INSERT INTO `newcrmdb`.`bpleadcf` SELECT * FROM `asteriskcdr`.`bpleadcf`'
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 667795964
                  Relay_Log_Space: 1134162270
                  Until_Condition: None
                   Until_Log_File:
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File:
               Master_SSL_CA_Path:
                  Master_SSL_Cert:
                Master_SSL_Cipher:
                   Master_SSL_Key:
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error:
                   Last_SQL_Errno: 1146
                   Last_SQL_Error: Error 'Table 'asteriskcdr.bpleadcf' doesn't exist' on query. Default database: '
    newcrmdb'. Query: 'INSERT INTO `newcrmdb`.`bpleadcf` SELECT * FROM `asteriskcdr`.`bpleadcf`'
    1 row in set (0.00 sec)

    Thank you everyone for your assistance.

    Comment


    • #3
      Ahhh...

      With regards to my similar post earlier, I tried to trouble shoot the issue, but in vein...

      Now when i run the command :- "show slave status \G"

      Code:
      mysql> show slave status \G
      *************************** 1. row ***************************
                     Slave_IO_State:
                        Master_Host: 10.222.1.218
                        Master_User: root
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000055
                Read_Master_Log_Pos: 315098143
                     Relay_Log_File: DatabackupSRV-relay-bin.000001
                      Relay_Log_Pos: 4
              Relay_Master_Log_File: mysql-bin.000055
                   Slave_IO_Running: No
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB:
                Replicate_Ignore_DB:
                 Replicate_Do_Table:
             Replicate_Ignore_Table:
            Replicate_Wild_Do_Table:
        Replicate_Wild_Ignore_Table:
                         Last_Errno: 0
                         Last_Error:
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 315098143
                    Relay_Log_Space: 106
                    Until_Condition: None
                     Until_Log_File:
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File:
                 Master_SSL_CA_Path:
                    Master_SSL_Cert:
                  Master_SSL_Cipher:
                     Master_SSL_Key:
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 1236
                      Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not f
      ind first log file name in binary log index file'
                     Last_SQL_Errno: 0
                     Last_SQL_Error:
      1 row in set (0.00 sec)

      I'll let everyone know, what i did :-

      This is the first change i did :-

      Code:
      mysql> change master to MASTER_LOG_FILE='mysql-bin.000055', Master_Log_Pos=4;
      Later i changed to this :-

      Code:
      mysql> change master to MASTER_LOG_FILE='mysql-bin.000055', Master_Log_Pos=315098143;
      Those are the 2 chnages i have done and now i get the status as shown above

      i hope i have not messed it up too much !!


      Thank you

      Comment


      • #4
        Yeah you stuck it to yourself on this one.

        The first issue is that you changed the replication position instead of fixing the error, and used an incorrect binlog file name format (you likely just used the one from that post you linked I'd guess). To get back to where you started, you need to find the binlog file and position that the slave sql_thread stopped at. Based on your slave status output, it looks like the slave is reading from a new binlog file (you can see that the Read_Master_Log_Pos value is smaller than the Exec_Master_Log_Pos value, which means it has to be reading a newer binlog file than where the slave sql_thread stopped at), so you need to find the binlog file that the slave sql_thread actually failed on. So look in the error log for something like the below:

        Code:
        2013-10-08 12:48:51 37545 [ERROR] Slave SQL: Error 'Table 'testdb.test2' doesn't exist' on query. Default database: 'testdb'. Query: 'insert into test1 select * from test2', Error_code: 1146
        2013-10-08 12:48:51 37545 [Warning] Slave: Table 'testdb.test2' doesn't exist Error_code: 1146
        2013-10-08 12:48:51 37545 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 3427
        This is a sample I re-created, so yours will be a bit different. Note the ERROR is similar to what you see in your slave status. So find your specific error message in the error log file, and then locate the end part where is gives you the file name and position ("We stopped at log 'mysql-bin.000001' position 3427" in my example). The position should be 315098143 based on your show slave status, as that is when it the slave sql_thread stopped executing events (Exec_Master_Log_Pos ) but the io_thread kept reading in new ones (Read_Master_Log_Pos).

        Once you find the correct binlog file name and position, re-run your change master statement on your slave using the information you located in the error log. Note that your file name should be something like "newcrmdb1-bin.XXXXXX", not mysql-bin.XXXXXX (you can see this naming convention your show slave status above).

        Code:
        mysql> change master to MASTER_LOG_FILE='newcrmdb1-bin.XXXXXX', Master_Log_Pos=315098143;
        Once you get pointed back to the original replication location where the slave sql_thread failed, you need to then fix the error that it was complaining about to start with.

        The initial replication error appears to be telling you that the table `asteriskcdr`.`bpleadcf` does not exist on the slave, so the insert statement is failing when it attempts to select the data from that table. So the problem there is that your slave appears to be already out of sync with your master. If the table in question on the master is static or mostly static, you could likely solve this by exporting the data from just that table on the master using mysqldump and loading it into the slave. If that is not possible, or you do not care about that data, you could always just skip the replication statement with sql_slave_skip_counter, but then the slave would be further out of sync with the master.

        And if all else fails, you can always rebuild the slave from the master as a last resort as well. =)

        Comment


        • #5
          Hello Scott,

          Phewww.....My apologies for causing you such a pain , but none the less the above explanation was very very useful and i did learn a few things extra

          But, I could not salvage the data from its last position and hence i had to redo the whole replication stuff once again, which i have just completed

          All thanks to your guidance ...

          Keep it up !!!

          Gnite

          Comment


          • #6
            Glad you got it going again anyway!

            If you want some practice, I'd recommend downloading MySQL Sandbox and setting up a quick replication environment on a test server. You can easily test situations like this, and practice recovering from them.

            I.e. in your test environment:

            1. Drop a table on the slave (not the master)
            2. Do a "insert into ... select from" statement on the master that uses the table you dropped on the slave in the FROM part (like your actual issue above)
            3. Use mysqldump to dump the missing table from the master and insert it on the slave
            4. Restart replication, which should then work now that the correct table exists again on the slave

            MySQL Sandbox:
            https://launchpad.net/mysql-sandbox/...-3.0.42.tar.gz

            Guide for installation / setup:
            http://search.cpan.org/~gmax/MySQL-S...box/Recipes.pm
            Last edited by scott.nemes; 10-10-2013, 11:09 AM.

            Comment


            • #7
              Thank you so much for the head'sup Scott..

              You have been a life saver !!

              Comment

              Working...
              X