Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Contact Us

MySQL 101: Basic MySQL Server Triage

April 7, 2021

Author

Share this Post:

So your MySQL server has crashed. What do you do now? When a server is down, in my opinion, there are two steps that are essential and both are extremely important and neither should be neglected:

1. Save diagnostic information for determining the root cause analysis (RCA).

1. Get the server back up and running.

Too many people rush to Step #2 and lose pertinent diagnostics from Step #1. Likewise, too many people will spend too much time on Step #1 and delay getting to Step #2 and restoring service. The goal is to collect diagnostics as quickly as possible for later review while getting service restored as fast as possible.

As a Technical Account Manager (TAM) and assisting on server restoration calls, I have seen both issues at play. Technical resources have a tendency to get so bogged down in trying to understand the cause of the server outage that they forget that the downtime is costing the business money. The desire to crawl through server logs, review metrics, pour-over system metrics, and so on, can be too tempting for some who are concerned that important diagnostic data will be lost when service is restored. This is a valid concern, but there must be a middle ground.

Conversely, many, especially those in management, will demand service is restored immediately to continue business functions. Of course, after the service is back up, the demand for an RCA will come. Sadly, many metrics, and some logs, are lost when a server is bounced. Below are basic guidelines on what metrics to collect for MySQL. The steps are in no particular order.

1. Save a copy of the MySQL Error Log.
  
  sudo cp /path/to/datadir/*.log /some/where/safe
  
  1
  
  sudo cp /path/to/datadir/*.log /some/where/safe

1. Make a copy of the MySQL configuration file.
  
  sudo cp /path/to/my.cnf /some/where/safe
  
  1
  
  sudo cp /path/to/my.cnf /some/where/safe

1. Make a copy of system logs and save them somewhere on persistent storage in a location that will not be overwritten. Consider doing something like the following on Linux:
  
  sudo cp /var/log/syslog /some/where/safe/syslog sudo cp /var/log/messages /some/where/safe/messages sudo journalctl -e > /some/where/safe/journalctl.txt
  
  1
  2
  3
  
  sudo cp /var/log/syslog /some/where/safe/syslog
  sudo cp /var/log/messages /some/where/safe/messages
  sudo journalctl -e > /some/where/safe/journalctl.txt

If MySQL is running still and you can log in, get some MySQL metrics. You will want to save the output into files somewhere.

sudo mysqladmin -i10 -c10 proc > /some/where/safe/mysql_procs.txt
mysql> SHOW GLOBAL VARIABLES;
sudo mysqladmin -i10 -c10 ext > /some/where/safe/mysql_ext.txt
mysql> SHOW ENGINE INNODB STATUSG

1

2

3

4

sudo mysqladmin -i10 -c10 proc > /some/where/safe/mysql_procs.txt

mysql> SHOW GLOBAL VARIABLES;

sudo mysqladmin -i10 -c10 ext > /some/where/safe/mysql_ext.txt

mysql> SHOW ENGINE INNODB STATUSG

1. If MySQL is running and you have Percona Toolkit, you should collect some pt-stalk output.
  
  sudo ./pt-stalk --no-stalk --iterations=2 --sleep=30 --dest=/some/where/safe -- --user=root --password=<mysql-root-pass>;
  
  1
  
  sudo ./pt-stalk --no-stalk --iterations=2 --sleep=30 --dest=/some/where/safe -- --user=root --password=<mysql-root-pass>;

1. If you have space and time, a copy of the database files (data directory in MySQL) could be helpful. Certainly, for many installations, getting all of the data files will be impossible. If it is a small database and space and time allow, it can be best to get all the files just in case.
  
  sudo cp -R /path/to/datadir /some/where/safe/datadir
  
  1
  
  sudo cp -R /path/to/datadir /some/where/safe/datadir

1. Copy database logs and save them somewhere safe for later review. Systems like Percona XtraDB Cluster (PXC) will create GRA files during an issue which can be really helpful to look at to determine the root cause. By combining the GRA header file with the contents of the GRA log files, you can use the mysqlbinlog command to get the records of transactions causing issues. More information can be found in one of our older blogs here
  Percona XtraDB Cluster (PXC): what about GRA_*.log files?.
  
  sudo cp /path/to/data/dir/GRA* /some/where/safe/datadir/
  
  1
  
  sudo cp /path/to/data/dir/GRA* /some/where/safe/datadir/

1. Save system metrics pertaining to CPU, I/O, and memory usage:
  
  sudo mpstat -a 1 60 > /some/where/safe/mpstat.txt sudo vmstat 1 60 > /some/where/safe/vmstat.txt sudo iostat -dmx 1 60 > /some/where/safe/iostat.txt
  
  1
  2
  3
  
  sudo mpstat -a 1 60 > /some/where/safe/mpstat.txt
  sudo vmstat 1 60 > /some/where/safe/vmstat.txt
  sudo iostat -dmx 1 60 > /some/where/safe/iostat.txt

1. Save system info.
  
  sudo cat /proc/cpuinfo > /some/where/safe/cpuinfo.txt
  
  1
  
  sudo cat /proc/cpuinfo > /some/where/safe/cpuinfo.txt

1. If you have Percona Toolkit, the following would be very helpful:
  
  sudo pt-summary > /some/where/safe/pt-summary.txt sudo pt-mysql-summary > /some/where/safe/pt-mysql-summary.txt
  
  1
  2
  
  sudo pt-summary > /some/where/safe/pt-summary.txt
  sudo pt-mysql-summary > /some/where/safe/pt-mysql-summary.txt

Get hardware diagnostics.

# disk info
sudo df -k > /some/where/safe/df_k.txt
sudo lsblk -o KNAME,SCHED,SIZE,TYPE,ROTA > /some/where/safe/lsblk.txt
sudo lsblk --all > $PTDEST/lsblk-all;

# lv/pv/vg only for systems with LVM
sudo lvdisplay --all --maps > /some/where/safe/lvdisplau-all-maps.txt
sudo pvdisplay --maps > /some/where/safe/pvdisplay-maps.txt
sudo pvs -v > /some/where/safe/pvs_v.txt
sudo vgdisplay > /some/where/safe/vgdisplay.txt

# nfsstat for systems with NFS mounts 
sudo nfsstat -m > /some/where/safe/nfsstat_m.txt
sudo nfsiostat 1 120 > /some/where/safe/nfsiostat.txt

# Collect hardware information 
sudo dmesg > /some/where/safe/dmesg.txt
sudo dmesg -T free -m > /some/where/safe/dmesg_free.txt 
sudo dmesg -T > /some/where/safe/dmesg_t.txt
sudo ulimit -a > /some/where/safe/ulimit_a.txt
sudo cat /proc/sys/vm/swappiness > /some/where/safe/swappiness 
sudo numactl --hardware > /some/where/safe/numactl-hardware.txt

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

# disk info

sudo df -k > /some/where/safe/df_k.txt

sudo lsblk -o KNAME,SCHED,SIZE,TYPE,ROTA > /some/where/safe/lsblk.txt

sudo lsblk --all > $PTDEST/lsblk-all;

# lv/pv/vg only for systems with LVM

sudo lvdisplay --all --maps > /some/where/safe/lvdisplau-all-maps.txt

sudo pvdisplay --maps > /some/where/safe/pvdisplay-maps.txt

sudo pvs -v > /some/where/safe/pvs_v.txt

sudo vgdisplay > /some/where/safe/vgdisplay.txt

# nfsstat for systems with NFS mounts

sudo nfsstat -m > /some/where/safe/nfsstat_m.txt

sudo nfsiostat 1 120 > /some/where/safe/nfsiostat.txt

# Collect hardware information

sudo dmesg > /some/where/safe/dmesg.txt

sudo dmesg -T free -m > /some/where/safe/dmesg_free.txt

sudo dmesg -T > /some/where/safe/dmesg_t.txt

sudo ulimit -a > /some/where/safe/ulimit_a.txt

sudo cat /proc/sys/vm/swappiness > /some/where/safe/swappiness

sudo numactl --hardware > /some/where/safe/numactl-hardware.txt

It goes without saying, it would be best to script the above into a useful bash script you can run when there is an issue. Just be sure to test the script in advance of an issue.

Again, the goal is to preserve useful diagnostic data that could be useful for determining the root cause of the issue at a later time after the service is restored. Just don’t get caught up in looking through the above diagnostics! Certainly, more data is better but the above is a great starting point. As time goes on, you may realize you wish you had other metrics and can add them to your script or Standard Operating Procedure (SOP).

Naturally, adding monitoring like Percona Monitoring and Management (PMM) would be a great option that can save you a lot of time and collect even more trends over time which can be extremely helpful.

With the above diagnostics, you would have a ton of information in the event of an issue to find the root cause. Now, you can sort through the diagnostics. Of course, if you need help with that, Percona can help you here as well.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today