In this blog, I am going to share a real-world debugging case study where a routine Percona XtraDB Cluster node restart led to an unexpected failure. I will walk through what we observed, what we checked, and how we ultimately identified the root cause.
Let’s see how the maintenance goes. It was supposed to be a simple restart. The kind you’ve done a hundred times. You SSH in, run the maintenance, bring the node back up, and go grab a coffee. Except this time, the coffee went cold on the desk… because MySQL refused to start.
The error log of Percona XtraDB Cluster (8.0) had the following information:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
2025-11-05T05:26:10.982984Z 0 [ERROR] [MY-000059] [Server] SSL error: Unable to get certificate from '/var/lib/mysql/server-cert.pem'. 2025-11-05T05:26:10.983030Z 0 [Warning] [MY-013595] [Server] Failed to initialize TLS for channel: mysql_main. See below for the description of exact issue. 2025-11-05T05:26:10.983045Z 0 [Warning] [MY-010069] [Server] Failed to set up SSL because of the following SSL library error: Unable to get certificate 2025-11-05T05:26:10.983052Z 0 [Note] [MY-000000] [WSREP] New joining cluster node configured to use specified SSL artifacts 2025-11-05T05:26:10.983083Z 0 [Note] [MY-000000] [Galera] Loading provider /usr/lib64/galera4/libgalera_smm.so initial position: 07c67757-0d18-11ef-b5a9-ee5d87b39aa8:4147053897 2025-11-05T05:26:10.983098Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library '/usr/lib64/galera4/libgalera_smm.so' 2025-11-05T05:26:10.983742Z 0 [Note] [MY-000000] [Galera] wsrep_load(): Galera 4.22(f6c0465) by Codership Oy <[email protected]> (modified by Percona <https://percona.com/>) loaded successfully. 2025-11-05T05:26:10.983771Z 0 [Note] [MY-000000] [Galera] Resolved symbol 'wsrep_node_isolation_mode_set_v1' 2025-11-05T05:26:10.983784Z 0 [Note] [MY-000000] [Galera] Resolved symbol 'wsrep_certify_v1' 2025-11-05T05:26:10.983807Z 0 [Note] [MY-000000] [Galera] CRC-32C: using 64-bit x86 acceleration. 2025-11-05T05:26:10.983995Z 0 [Note] [MY-000000] [Galera] not using SSL compression 2025-11-05T05:26:10.984341Z 0 [ERROR] [MY-000000] [Galera] Bad value '/var/lib/mysql/server-cert.pem' for SSL parameter 'socket.ssl_cert': 336245135: 'error:140AB18F:SSL routines:SSL_CTX_use_certificate:ee key too small' at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.42/percona-xtradb-cluster-galera/galerautils/src/gu_asio.cpp:ssl_prepare_context():471 2025-11-05T05:26:10.984401Z 0 [ERROR] [MY-000000] [Galera] Failed to create a new provider '/usr/lib64/galera4/libgalera_smm.so' with options 'gcache.size=1G;gcache.recover=yes;socket.ssl=yes;socket.ssl_ca=/data00/mysqldata/ca.pem;socket.ssl_cert=/data00/mysqldata/server-cert.pem;socket.ssl_key=/data00/mysqldata/server-key.pem;socket.ssl_key=/var/lib/mysql/server-key.pem;socket.ssl_ca=/var/lib/mysql/ca.pem;socket.ssl_cert=/var/lib/mysql/server-cert.pem': Failed to initialize wsrep provider 2025-11-05T05:26:10.984434Z 0 [ERROR] [MY-000000] [WSREP] Failed to load provider 2025-11-05T05:26:10.984448Z 0 [ERROR] [MY-010119] [Server] Aborting 2025-11-05T05:26:10.984602Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.42-33.1) Percona XtraDB Cluster (GPL), Release rel33, Revision 6673f8e, WSREP version 26.1.4.3. 2025-11-05T05:26:10.985473Z 0 [ERROR] [MY-010065] [Server] Failed to shutdown components infrastructure. |
MySQL was down, and the maintenance clock was running. The certificate file sitting at /var/lib/mysql/server-cert.pem was the same file that had been working perfectly fine before the restart!!
From past history, it was known that the following commands were executed correctly on the same cluster node
|
1 2 3 4 |
SET GLOBAL ssl_ca = '/var/lib/mysql/ca.pem'; SET GLOBAL ssl_cert = '/var/lib/mysql/server-cert.pem'; SET GLOBAL ssl_key = '/var/lib/mysql/server-key.pem'; ALTER INSTANCE RELOAD TLS; |
Clients connected over TLS. Galera nodes communicated securely. There were zero complaints from the error log.
In other words, the SSL reload at runtime inherited the process environment that existed when MySQL originally booted. Everything was smooth, but after a restart? MySQL complains and declines to start. So what has changed?
File permissions
We checked the PEM files.
Ownership: mysql:mysql.
Permissions: 644 for the cert, 600 for the key.
We compared them against the other Galera nodes, and they were identical. This didn’t look like a permissions problem.
Is SELinux to blame here?
SELinux has ruined enough DBA time that it is one of the top spots on such checklists – but it was permissive.
|
1 2 |
$ getenforce <span style="font-weight: 400;">Permissive</span> |
That means it was logging any security issues, but not blocking. And there were no AVC denials related to MySQL or the PEM files in /var/log/audit/audit.log or dmesg!
File corruption
Did the files get corrupted/replaced during or before the MySQL restart?
|
1 2 3 4 5 |
$ openssl x509 -in /var/lib/mysql/server-cert.pem -noout -text # Output looked perfectly valid when compared to the output from other nodes $ openssl rsa -in /var/lib/mysql/server-key.pem -check RSA key ok |
The files were fine. They parsed cleanly. OpenSSL could read them. So why couldn’t MySQL?
More Logs review
We scanned /var/log/messages and journalctl for anything unusual around the time of the restart. No disk errors. No OOM kills. No kernel panics. Nothing that screamed “I am the Dhurandhar that’s destroyed your node.” At this point, most of the usual suspects were guilt-free, staring at us, asking, “Who did it?”
It is good to communicate with stakeholders, and we did – “Was there any recent change on your side?” to the client, and then uttered the golden words “Last week the crypto-policy was updated on all of the DB servers to comply with PCI.”
PCI > Crypto-policy – Let’s go and check it !!
|
1 2 |
$ update-crypto-policies --show FUTURE |
The system was running RHEL’s FUTURE cryptographic policy.
For those unfamiliar (including me at the time), Red Hat Enterprise Linux (and its derivatives, such as Rocky, Alma, and Oracle Linux) ships with a system-wide cryptographic policy framework. It’s a centralized way to enforce minimum standards for TLS versions, cipher suites, key lengths, and signature algorithms across all applications on the system that include OpenSS and yes, anything that links against those libraries… like MySQL.
Here’s a table that shows information about the crypto-policy levels:
| Policy | RSA Minimum | TLS Minimum | SHA-1 Signatures | Use Case |
| LEGACY | 1024-bit | TLS 1.0 | Allowed | Old systems compatibility |
| DEFAULT | 2048-bit | TLS 1.2 | Allowed | Standard operations |
| FUTURE | 3072-bit | TLS 1.2 | Blocked | Forward-looking hardening |
| FIPS | 2048-bit | TLS 1.2 | Blocked | FIPS 140 compliance |
So FUTURE demands a 3072-bit RSA key; otherwise, it is blocked. What do we have?
|
1 2 |
$ openssl rsa -in server-key.pem -text -noout | head -1 RSA Private Key: (2048 bit, 2 primes) |
2048 bits! C’mon! And now I recall the error log again… The hint was there:
|
1 |
error:140AB18F:SSL routines:SSL_CTX_use_certificate:ee key too small |
Now we have our story straight.
On restart, our PXC cluster node started a new process linked against OpenSSL, which now enforced the FUTURE policy. OpenSSL looked at the 2048-bit RSA certificate and said: “Nope. Too small.”
The quick fix here would be to adjust the policy to DEFAULT.
|
1 |
sudo update-crypto-policies --set DEFAULT |
This will accept the current SSLs, and the node will join the cluster readily.
Alternatively, to remain compliant and adhere to the security policy strictness, the fixture will be to
This was a classic case of a problem hiding at the boundary between two domains, database administration and operating system security. The DBA saw valid certificates and correct MySQL configuration. The sysadmin saw a properly hardened system with a strong crypto policy. Neither was wrong. But the intersection of their two correct configurations produced a failure.
This incident reinforces the importance of cross-domain awareness, where resolving database issues sometimes requires understanding and challenging system-level security decisions.