Emergency

Can't restart node after brutal shutdown in 3 nodes pxc 57 - systemd related ?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can't restart node after brutal shutdown in 3 nodes pxc 57 - systemd related ?

    Hi,

    So let's start with my setup:
    - 3 nodes pxc cluster (up to date CentOS 7 and Percona-XtraDB-Cluster-57 5.7.19-29.22.1.el7)
    mysql-test-01 10.0.0.51
    mysql-test-02 10.0.0.52
    mysql-test-03 10.0.0.53
    - I have followed all the guidelines for CentOS (setenforce 0, .....)
    - I bootstraped the cluster from the node "mysql-test-01" 10.0.0.51 using "systemctl start mysql@bootstrap.service"
    - Other nodes joined the cluster without problem
    - The cluster was working

    I did some tests, I gracefully shutdown node "10.0.0.53", the cluster kept working, I restarted it with "systemctl start mysql". It rejoined the cluster and recovered.
    I did the same thing (graceful shutdown) with node "10.0.0.51". Everything OK

    The problem occured when I "brutally" shutdown the node "10.0.0.51". I then started again the virtual machine, and then I use "systemctl start mysql.service" to start the node. It should start and automatically recover but it's not working. I am pretty sure this is related to "systemd" but I am confused because I didn't touch the default configuration that come with the package. I was hoping a working systemd configuration.


    More informations:

    configuration of node "10.0.0.51"

    ​​​​​​​
    ---------- /etc/my.cnf ----------
    #
    # The Percona XtraDB Cluster 5.7 configuration file.
    #
    #
    # * IMPORTANT: Additional settings that can override those from this file!
    # The files must end with '.cnf', otherwise they'll be ignored.
    # Please make any edits and changes to the appropriate sectional files
    # included below.
    #
    !includedir /etc/my.cnf.d/
    !includedir /etc/percona-xtradb-cluster.conf.d/
    ----------------------------------------

    ---------- /etc/percona-xtradb-cluster.conf.d/wsrep.cnf ----------
    [mysqld]
    # Path to Galera library
    wsrep_provider=/usr/lib64/galera3/libgalera_smm.so

    # Cluster connection URL contains IPs of nodes
    #If no IP is found, this implies that a new cluster needs to be created,
    #in order to do that you need to bootstrap this node
    wsrep_cluster_address=gcomm://10.0.0.51,10.0.0.52,10.0.0.53

    # In order for Galera to work correctly binlog format should be ROW
    binlog_format=ROW

    # MyISAM storage engine has only experimental support
    default_storage_engine=InnoDB

    # Slave thread to use
    wsrep_slave_threads= 8

    wsrep_log_conflicts

    # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
    innodb_autoinc_lock_mode=2

    # Node IP address
    wsrep_node_address=10.0.0.51
    # Cluster name
    wsrep_cluster_name=pxc-cluster-test

    #If wsrep_node_name is not specified, then system hostname will be used
    wsrep_node_name=mysql-test-01

    #pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
    pxc_strict_mode=ENFORCING

    # SST method
    wsrep_sst_method=xtrabackup-v2

    #Authentication for SST method
    wsrep_sst_auth="sstuserassword"
    ----------------------------------------

    ---------- /etc/percona-xtradb-cluster.conf.d/mysqld.cnf ----------
    # MYSQL-TEST-01
    # Template my.cnf for PXC
    # Edit to your requirements.
    [client]
    socket=/var/lib/mysql/mysql.sock

    [mysqld]
    server-id=51
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    log-bin
    log_slave_updates
    expire_logs_days=7

    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0

    # Custom config
    max-allowed-packet=64M

    #
    # InnoDB settings
    #
    innodb_file_per_table
    innodb_buffer_pool_size=1024M
    innodb_log_file_size=256M
    innodb_flush_log_at_trx_commit=1
    innodb_flush_method=O_DIRECT
    --------------------------------------------------

    And finally an extract from systemd journalctl

    ----------------------------------------
    [root@mysql-test-01 ~]# systemctl start mysql.service
    Job for mysql.service failed because the control process exited with error code. See "systemctl status mysql.service" and "journalctl -xe" for details.
    [root@mysql-test-01 ~]# journalctl -u mysql.service
    ....
    Oct 02 17:21:39 mysql-test-01 systemd[1]: Starting Percona XtraDB Cluster...
    Oct 02 17:21:40 mysql-test-01 mysqld_safe[2761]: 2017-10-02T15:21:40.081340Z mysqld_safe Logging to '/var/log/mysqld.log'.
    Oct 02 17:21:40 mysql-test-01 mysqld_safe[2761]: 2017-10-02T15:21:40.085011Z mysqld_safe Logging to '/var/log/mysqld.log'.
    Oct 02 17:21:40 mysql-test-01 mysqld_safe[2761]: 2017-10-02T15:21:40.119964Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    Oct 02 17:21:40 mysql-test-01 mysqld_safe[2761]: 2017-10-02T15:21:40.140406Z mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.IG3oHj' --pid-file='/var/lib/mysql/mysql-test-01-recover.pid'
    Oct 02 17:21:43 mysql-test-01 mysql-systemd[2762]: /usr/bin/mysql-systemd: line 140: kill: (2761) - No such process
    Oct 02 17:21:43 mysql-test-01 mysql-systemd[2762]: ERROR! mysqld_safe with PID 2761 has already exited: FAILURE
    Oct 02 17:21:43 mysql-test-01 systemd[1]: mysql.service: control process exited, code=exited status=1
    Oct 02 17:21:43 mysql-test-01 mysql-systemd[3332]: WARNING: mysql pid file /var/lib/mysql/mysqld.pid empty or not readable
    Oct 02 17:21:43 mysql-test-01 mysql-systemd[3332]: ERROR! mysql already dead
    Oct 02 17:21:43 mysql-test-01 systemd[1]: mysql.service: control process exited, code=exited status=2
    ----------------------------------------

    I have read a lot about systemd problems with mysql and pxc, but I can't find a clue about mine. I am stuck because I have simulated a failure and I can't restart my cluster, it's a dev environnement but I won't continue further until I find a solution.

    Any ideas ?

    ​​​​​​​Regards,

  • #2
    Just a complement with this article: https://www.percona.com/blog/2014/09...a-pxc-cluster/

    I have reproduced scenario 4 but the node won't restart.

    Comment


    • #3
      So I am trying to manually debug this problem and I have noticed two things.


      1) Always on CentOS 7 Percona-XtraDB-Cluster-57 5.7.19-29.22.1.el7, in mysqld_safe, line 296

      Code:
      eval_log_error "$mysqld_cmd --wsrep_recover $wr_options"

      Is it correct ? I am a bit confused about the options syntax so would it not be better like this
      Code:
      eval_log_error "$mysqld_cmd _wsrep_recover $wr_options"
      or this
      Code:
      eval_log_error "$mysqld_cmd --wsrep-recover $wr_options"

      2)
      I put a lot of debug print in mysqld_safe and mysql-systemd
      I can manually restart the cluster and it seems to recover well with this command line:
      Code:
      $mysqld_cmd line 296
      nohup /usr/sbin/mysqld  --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql  --wsrep-provider=/usr/lib64/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
      But when the "wsrep" options are added to the command line in the mysqld_safe script, it does not work after
      Code:
      $cmd line 218
      nohup /usr/sbin/mysqld  --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql  --wsrep-provider=/usr/lib64/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --wsrep-recover --log_error='/var/lib/mysql/wsrep_recovery.PUR9Wb' --pid-file='/var/lib/mysql/mysql-test-01-recover.pid' > /dev/null 2>&1
      And we can also notice here that the mysqld_safe script generate a command line with two options "--pid-file" and two options "--log-error".

      Regards,

      Comment


      • #4
        After several hours, I finally found out the problem. Spoiler alert: it's stupid


        There is a bad integer comparisons in the "mysqld_safe" script.

        Code:
        [root@mysql-test-01 ~]# diff /root/mysqld_safe-BAD /usr/bin/mysqld_safe
        220c220
        <   if [ $ret > 0 ]; then
        ---
        >   if [ "$ret" -gt "0" ]; then
        Now it's working fine after a brutal shutdown.

        Should I report a bug somewhere in order to correct the issue in futur releases ?

        Regards,

        Comment


        • #5
          Thanks vleclere.

          You can report the issue here https://bugs.launchpad.net/percona-xtradb-cluster/

          Comment


          • #6
            Well found 'vleclere'
            Bug not reported from what I can tell, but at least you have found a solution and documented it here, many thanks for doing that.

            To reproduce:
            • I Installed Virtualbox - (I was making a Dev server)
            • I Installed a minimal - CentOS Linux release 7.4.1708 (Core) - virtual machine - or just a regular server.
            • Install latest Percona repo following docs
            • Install Percona XtraDB Cluster 5.7
              • yum install Percona-XtraDB-Cluster-full-57
            • Do what the install requests (I missed this the first time)
              • rm -rf /etc/my.cnf
              • update-alternatives --install /etc/my.cnf my.cnf "/etc/percona-xtradb-cluster.cnf" 200
              • To see all the new my config files and folders optionally do the below
              • ls -la /etc/percona-xtradb-cluster.c*; ls -la /etc/my.cnf*
            • Start mysql now with systemctl / systemd (bootstrap or don't I guess)
              • systemctl start mysql.service
            • Get root password
              • grep 'temporary password' /var/log/mysqld.log
            • mysql -u root -p
              • enter password
            • Change mysql password
              • ALTER USER 'root'@'localhost' IDENTIFIED BY 'newPassWord';
              • exit
              • restart mysql (or stop start)
                • systemctl restart mysql.service
            • Maybe or maybe don't add a database and or insert some data. I did, and also when this bug hit did alot of my.cnf tweaks and /usr/lib/systemd/system/mysql.service / systemctl daemon-reload tweaks to no avail.
            • As vleclere put it so well CREATE a brutal shutdown. Kill the vm or server without a graceful shutdown.
              • Power off the virtual machine or server as if the power failed without a backup UPS. So on virtual box i think that is the power off option without CentOS cleanly shutting down all servers like Percona / mysql. Rather common on a Windows machine as the virtual box is killed if you get a bluescreen of death or a Windows 10 surprise update.
            • Try and start up Percona/mysql after boot and you should get the failure as further below, if not I have possibly left something out - break it.....

            Just to clarify the solution for others that have this problem. Big thanks to vleclere.

            vim /usr/bin/mysqld_safe
            then you edit +/- line 220 or search for "if [ $ret"

            change from
            if [ $ret > 0 ]; then

            change to
            if [ "$ret" -gt "0" ]; then


            Error message was:

            journalctl -xe OR systemctl statusmysql.service
            Oct 15 14:22:39 reg mysql-systemd[8330]: /usr/bin/mysql-systemd: line 140: kill: (8329) - No such process
            Oct 15 14:22:39 reg mysql-systemd[8330]: ERROR! mysqld_safe with PID 8329 has already exited: FAILURE
            Oct 15 14:22:39 reg systemd[1]: mysql.service: control process exited, code=exited status=1
            Oct 15 14:22:39 reg mysql-systemd[8505]: WARNING: mysql pid file /var/lib/mysql/reg.resafrica-reg.com.pid empty or not readable
            Oct 15 14:22:39 reg mysql-systemd[8505]: ERROR! mysql already dead
            Oct 15 14:22:39 reg systemd[1]: mysql.service: control process exited, code=exited status=2
            Oct 15 14:22:39 reg mysql-systemd[8547]: WARNING: mysql pid file /var/lib/mysql/reg.resafrica-reg.com.pid empty or not readable
            Oct 15 14:22:39 reg mysql-systemd[8547]: WARNING: mysql may be already dead
            Oct 15 14:22:39 reg systemd[1]: Failed to start Percona XtraDB Cluster.
            -- Subject: Unit mysql.service has failed



            Regards
            Rory

            Comment


            • #7
              I still have this issue and I have changed the /usr/bin/mysqld_safe at line 220

              I found a way to start the node:

              1. Run

              /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-provider=/usr/lib64/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock

              2. ps aux | grep mysql and kill the proccess, wait to shutdown

              3. service mysql start


              Also, service mysql start as root and make the mysql.pid as root, to fix this you need to modify /usr/lib/systemd/system/mysql.service and put

              User=mysql
              Group=mysql

              like this:

              [Service]
              # Needed to create system tables etc.
              ExecStartPre=/usr/bin/mysql-systemd start-pre

              User=mysql
              Group=mysql


              When do you have in mind to fix this?

              Thanks.

              Comment


              • #8
                Originally posted by krunalbauskar View Post
                Thanks vleclere.

                You can report the issue here https://bugs.launchpad.net/percona-xtradb-cluster/
                Hello,

                I've reported this issue on launchpad.

                Full credit to vleclere

                Thanks
                Chris

                Comment

                Working...
                X