Emergency

proxysql looses server after restart of mysqld (percona xtradb cluster)

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • proxysql looses server after restart of mysqld (percona xtradb cluster)

    Hello
    I have a testsetup, 4 vm, 3 node percona xtradb cluster + proxysql running on each, 1 vm as client.
    RHEL 6.9
    Percona-XtraDB-Cluster-57-5.7.19-29.22.3.el6.x86_64
    proxysql-1.4.3-1.1.el6.x86_64
    all from percona repository
    proxysql is setup with 2 hostgroups, one for write and one for read traffic, but the following problem is before any client connect:

    Code:
     (admin@localhost) [(none)]> select * from mysql_servers order by hostgroup_id,hostname;
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
    | hostgroup_id | hostname     | port | status       | weight     | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
    | 500          | 192.168.0.51 | 3306 | ONLINE       | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |         |
    | 500          | 192.168.0.52 | 3306 | OFFLINE_SOFT | 1000000    | 0           | 1000            | 0                   | 0       | 0              |         |
    | 500          | 192.168.0.53 | 3306 | OFFLINE_SOFT | 100        | 0           | 1000            | 0                   | 0       | 0              |         |
    | 501          | 192.168.0.51 | 3306 | OFFLINE_SOFT | 100        | 0           | 1000            | 0                   | 0       | 0              |         |
    | 501          | 192.168.0.52 | 3306 | ONLINE       | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |         |
    | 501          | 192.168.0.53 | 3306 | ONLINE       | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |         |
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
    I restart the mysqld on one node (/etc/init.d/myqsl restart) as we would do after an yum update run e.g.
    After mysql is back up again:

    Code:
     (admin@localhost) [(none)]> select * from mysql_servers order by hostgroup_id,hostname;
    +--------------+--------------+------+--------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
    | hostgroup_id | hostname     | port | status | weight  | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
    +--------------+--------------+------+--------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
    | 500          | 192.168.0.51 | 3306 | ONLINE | 1000000 | 0           | 1000            | 0                   | 0       | 0              | WRITE   |
    | 501          | 192.168.0.52 | 3306 | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              | READ    |
    | 501          | 192.168.0.53 | 3306 | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              | READ    |
    +--------------+--------------+------+--------+---------+-------------+-----------------+---------------------+---------+----------------+---------+
    3 rows in set (0.00 sec)
    Note that this is the output of "select * from mysql_servers" but "select * from runtime_mysql_servers" shows the same. This happens to proxysql on the 1. node also if I restart on the 2. node. It does not always get corrupt, but often enough. We have a customer using proxysql on his appserver with the same effekt (1.3.9 installed there). It does not loose all servers, sometime it looks like this:
    Code:
    | hostgroup_id | hostname     | port | status       | weight     | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
    | 500          | 192.168.0.51 | 3306 | ONLINE       | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |         |
    | 500          | 192.168.0.53 | 3306 | OFFLINE_SOFT | 100        | 0           | 1000            | 0                   | 0       | 0              |         |
    | 501          | 192.168.0.51 | 3306 | OFFLINE_SOFT | 100        | 0           | 1000            | 0                   | 0       | 0              |         |
    | 501          | 192.168.0.52 | 3306 | ONLINE       | 1000       | 0           | 1000            | 0                   | 0       | 0              | READ    |
    | 501          | 192.168.0.53 | 3306 | ONLINE       | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |         |
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
    (1. node proxysql, restart of mysql on second node) Until we do "load mysql servers to memory;" proxysql keeps this broken state. Depending on the state the next shutdown of a node could lead to a downtime of the application, as no failover is possible. Anything more I could provide to get this fixed?
    Hubertus
    Last edited by HubertusKrogmann; 11-13-2017, 10:51 AM.

  • #2
    /var/lib/proxysql/proxysql_galera_checker.log entries around a /etc/init.d/mysql stop && sleep 10 && /etc/init.d/mysql start
    (restart node 1 proxysql on node 3)

    see attached prox.galera.log.txt (to much text)

    After restart
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
    | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
    | 500 | 192.168.0.52 | 3306 | ONLINE | 1000000 | 0 | 1000 | 0 | 0 | 0 | |
    | 500 | 192.168.0.53 | 3306 | OFFLINE_SOFT | 100 | 0 | 1000 | 0 | 0 | 0 | |
    | 501 | 192.168.0.51 | 3306 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | READ |
    | 501 | 192.168.0.52 | 3306 | OFFLINE_SOFT | 1000000000 | 0 | 1000 | 0 | 0 | 0 | |
    | 501 | 192.168.0.53 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | |
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+

    Why is the 1. node not coming back to the host group, in the time the 1 node was down it look like this:

    -------------------+---------+----------------+---------+
    | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
    | 500 | 192.168.0.52 | 3306 | ONLINE | 1000000 | 0 | 1000 | 0 | 0 | 0 | |
    | 500 | 192.168.0.53 | 3306 | OFFLINE_SOFT | 100 | 0 | 1000 | 0 | 0 | 0 | |
    | 501 | 192.168.0.52 | 3306 | OFFLINE_SOFT | 1000000000 | 0 | 1000 | 0 | 0 | 0 | |
    | 501 | 192.168.0.53 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | |
    +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+

    I would expect "OFFLINE" / "OFFLINE_HARD" entries all the time and not entries disappearing.

    show global variables; attached prox.global.txt
    Attached Files

    Comment


    • #3
      .
      /var/lib/proxysql/proxysql_galera_checker.log entries around a /etc/init.d/mysql stop && sleep 10 && /etc/init.d/mysql start
      (restart node 1 proxysql on node 3)

      see attached prox.galera.log.txt (to much text)

      After restart
      +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
      | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
      +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
      | 500 | 192.168.0.52 | 3306 | ONLINE | 1000000 | 0 | 1000 | 0 | 0 | 0 | |
      | 500 | 192.168.0.53 | 3306 | OFFLINE_SOFT | 100 | 0 | 1000 | 0 | 0 | 0 | |
      | 501 | 192.168.0.51 | 3306 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | READ |
      | 501 | 192.168.0.52 | 3306 | OFFLINE_SOFT | 1000000000 | 0 | 1000 | 0 | 0 | 0 | |
      | 501 | 192.168.0.53 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | |
      +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+

      Why is the 1. node not coming back to the host group, in the time the 1 node was down it look like this:

      -------------------+---------+----------------+---------+
      | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
      +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+
      | 500 | 192.168.0.52 | 3306 | ONLINE | 1000000 | 0 | 1000 | 0 | 0 | 0 | |
      | 500 | 192.168.0.53 | 3306 | OFFLINE_SOFT | 100 | 0 | 1000 | 0 | 0 | 0 | |
      | 501 | 192.168.0.52 | 3306 | OFFLINE_SOFT | 1000000000 | 0 | 1000 | 0 | 0 | 0 | |
      | 501 | 192.168.0.53 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | |
      +--------------+--------------+------+--------------+------------+-------------+-----------------+---------------------+---------+----------------+---------+

      I would expect "OFFLINE" / "OFFLINE_HARD" entries all the time and not entries disappearing.

      show global variables; attached prox.global.txt
      Attached Files

      Comment

      Working...
      X