Curious case of PXC node that refused to start due to SSL

May 4, 2026
Author
Kedar Vaijanapurkar
Share this Post:

In this blog, I am going to share a real-world debugging case study where a routine Percona XtraDB Cluster node restart led to an unexpected failure. I will walk through what we observed, what we checked, and how we ultimately identified the root cause.

Let’s see how the maintenance goes. It was supposed to be a simple restart. The kind you’ve done a hundred times. You SSH in, run the maintenance, bring the node back up, and go grab a coffee. Except this time, the coffee went cold on the desk… because MySQL refused to start.

The Problem

The error log of Percona XtraDB Cluster (8.0) had the following information:

 

MySQL was down, and the maintenance clock was running. The certificate file sitting at /var/lib/mysql/server-cert.pem was the same file that had been working perfectly fine before the restart!!
From past history, it was known that the following commands were executed correctly on the same cluster node

Clients connected over TLS. Galera nodes communicated securely. There were zero complaints from the error log.
In other words, the SSL reload at runtime inherited the process environment that existed when MySQL originally booted. Everything was smooth, but after a restart? MySQL complains and declines to start. So what has changed?

Checking Usual Suspects

File permissions

We checked the PEM files. 

Ownership: mysql:mysql.
Permissions: 644 for the cert, 600 for the key. 

We compared them against the other Galera nodes, and they were identical. This didn’t look like a permissions problem.

Is SELinux to blame here?

SELinux has ruined enough DBA time that it is one of the top spots on such checklists – but it was permissive.

That means it was logging any security issues, but not blocking. And there were no AVC denials related to MySQL or the PEM files in /var/log/audit/audit.log or dmesg!

File corruption

Did the files get corrupted/replaced during or before the MySQL restart?

The files were fine. They parsed cleanly. OpenSSL could read them. So why couldn’t MySQL?

More Logs review

We scanned /var/log/messages and journalctl for anything unusual around the time of the restart. No disk errors. No OOM kills. No kernel panics. Nothing that screamed “I am the Dhurandhar that’s destroyed your node.” At this point, most of the usual suspects were guilt-free, staring at us, asking, “Who did it?”

The Clue

It is good to communicate with stakeholders, and we did – “Was there any recent change on your side?” to the client, and then uttered the golden words “Last week the crypto-policy was updated on all of the DB servers to comply with PCI.”

PCI > Crypto-policy – Let’s go and check it !!

The system was running RHEL’s FUTURE cryptographic policy.

For those unfamiliar (including me at the time), Red Hat Enterprise Linux (and its derivatives, such as Rocky, Alma, and Oracle Linux) ships with a system-wide cryptographic policy framework. It’s a centralized way to enforce minimum standards for TLS versions, cipher suites, key lengths, and signature algorithms across all applications on the system that include OpenSS and yes, anything that links against those libraries… like MySQL.

Here’s a table that shows information about the crypto-policy levels:

Policy RSA Minimum TLS Minimum SHA-1 Signatures Use Case
LEGACY 1024-bit TLS 1.0 Allowed Old systems compatibility
DEFAULT 2048-bit TLS 1.2 Allowed Standard operations
FUTURE 3072-bit TLS 1.2 Blocked Forward-looking hardening
FIPS 2048-bit TLS 1.2 Blocked FIPS 140 compliance

 

 

 

 

 

 

 

 

So FUTURE demands a 3072-bit RSA key; otherwise, it is blocked. What do we have?

2048 bits! C’mon! And now I recall the error log again… The hint was there:

Now we have our story straight.
On restart, our PXC cluster node started a new process linked against OpenSSL, which now enforced the FUTURE policy. OpenSSL looked at the 2048-bit RSA certificate and said: “Nope. Too small.”

Fixture

The quick fix here would be to adjust the policy to DEFAULT.

This will accept the current SSLs, and the node will join the cluster readily.

Alternatively, to remain compliant and adhere to the security policy strictness, the fixture will be to

  • Generate new certificates
  • Deploy the keys/certs to all Galera nodes
  • Perform a rolling restart

 

Conclusion

This was a classic case of a problem hiding at the boundary between two domains, database administration and operating system security. The DBA saw valid certificates and correct MySQL configuration. The sysadmin saw a properly hardened system with a strong crypto policy. Neither was wrong. But the intersection of their two correct configurations produced a failure.

This incident reinforces the importance of cross-domain awareness, where resolving database issues sometimes requires understanding and challenging system-level security decisions.

 

 

 

 

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved