There is a less-talked-about improvement in PostgreSQL 12 which can greatly reduce the benign log entries. This patch is probably one of the smallest in PostgreSQL 12.
The commit message says:
|
1 |
Don't log incomplete startup packet if it's empty<br><br>This will stop logging cases where, for example, a monitor opens a<br>connection and immediately closes it. If the packet contains any data an<br>incomplete packet will still be logged.<br><br>Author: Tom Lane |
This patch is going to improve the experience of many enterprise users by reducing unwanted log entries. It is very common to see the PostgreSQL log file running into several GBs due mainly to such unwanted benign entries.
You can read the full discussion thread at postgresql.org.
In PostgreSQL, for each client connection request to Postmaster (listens on port 5432 by default), a backend process will be created. It then processes the startup packet from the client. Refer to src/backend/postmaster/postmaster.c for the source code. Each client connection request is expected to send a startup message to the PostgreSQL server, and this information in the startup packet is used for setting up the backend process. But there are many more things happening when we deploy PostgreSQL in a datacenter. There could be different monitoring solutions, security scanners, port scanners, HA Solutions, etc hitting on PostgreSQL Port 5432. PostgreSQL starts processing these incoming connections for establishing a client-server communication channel. But many of these tools may have a different intention and won’t be participating in a good client-server protocol. Historically, PostgreSQL generates a log entry for each of these suspected/bad hits. This can result in log files growing to a huge size and can cause unwanted log-related IO.
Even though it looks silly, this was so annoying that many tool vendors started documenting it for their customers, advising them to just ignore such messages, as we can see here. HA Solutions like Stolon reported a similar problem. Monitoring plugins for Nagios, Cacti, and Zabbix also caused the same, and it appeared in the PostgreSQL mailing list multiple times over several years. For example:
Any port scanner or TCP Port checker can cause the log entries. The ncat/nc utility has Zero-I/O mode and reports connection status only (-z option).
|
1 |
for i in {1..100}; do<br> nc -zv localhost 5432 ;<br>done |
This produces LOG entries like the following for PostgreSQL version up to 11:
|
1 |
2019-11-28 13:24:26.501 UTC [15168] LOG: incomplete startup packet<br>2019-11-28 13:24:26.517 UTC [15170] LOG: incomplete startup packet<br>2019-11-28 13:24:26.532 UTC [15172] LOG: incomplete startup packet<br>2019-11-28 13:24:26.548 UTC [15174] LOG: incomplete startup packet<br>2019-11-28 13:24:26.564 UTC [15176] LOG: incomplete startup packet<br>2019-11-28 13:24:26.580 UTC [15178] LOG: incomplete startup packet<br>2019-11-28 13:24:26.595 UTC [15180] LOG: incomplete startup packet<br>2019-11-28 13:24:26.611 UTC [15182] LOG: incomplete startup packet<br>2019-11-28 13:24:26.627 UTC [15184] LOG: incomplete startup packet<br>2019-11-28 13:24:26.645 UTC [15186] LOG: incomplete startup packet<br>2019-11-28 13:24:26.666 UTC [15188] LOG: incomplete startup packet<br>2019-11-28 13:24:26.687 UTC [15190] LOG: incomplete startup packet<br>2019-11-28 13:24:26.710 UTC [15192] LOG: incomplete startup packet<br>2019-11-28 13:24:26.729 UTC [15194] LOG: incomplete startup packet<br>2019-11-28 13:24:26.748 UTC [15196] LOG: incomplete startup packet<br>... |
But in PostgreSQL 12, it detects that it is zero size packets and just ignores it. There won’t be any entry in the log file.
Unfortunately, some of the tools are not gentle enough to write a zero size packet. As per consciences in the community, this needs to be logged.
|
1 |
2019-11-28 14:27:49.728 UTC [17982] LOG: invalid length of startup packet<br>2019-11-28 14:28:14.907 UTC [17983] LOG: invalid length of startup packet<br>2019-11-28 14:28:18.236 UTC [17984] LOG: invalid length of startup packet |
Tom Lane explained in the mailing list:
|
1 |
" The agreed-to behavior change was to not log anything if the connection is closed without any data having been sent. If the<br>client *does* send something, and it doesn't look like a valid connection request, I think we absolutely should log that." |
So we should expect to see such messages in PostgreSQL 12 also. We can simulate the problem by using telnet instead of nc command to check the open port. Some other tools abruptly end the connections which cause errors in libpq (library which implements PostgreSQL network protocol).
|
1 |
2019-11-28 14:11:45.273 UTC [17951] LOG: could not receive data from client: Connection reset by peer<br>2019-11-28 14:11:47.328 UTC [17953] LOG: could not receive data from client: Connection reset by peer<br>2019-11-28 14:11:48.425 UTC [17955] LOG: could not receive data from client: Connection reset by peer<br>2019-11-28 14:27:11.870 UTC [17978] LOG: could not receive data from client: Connection reset by peer |
Such entries are also not going to go away. This happens when the server process tries to read packets (Refer: pq_recvbuf function in src/backend/libpq/pqcomm.c) sent from its client-side and then realizes that client-side is already lost. Which means that the client ended communication without a good handshake.
However, there will be a level of savings. Some tools like nmap used to produce both libpq errors and zero size packet error in PostgreSQL 11 as below:
|
1 |
2019-11-28 13:53:31.721 UTC [15446] LOG: could not receive data from client: Connection reset by peer<br>2019-11-28 13:53:31.721 UTC [15446] LOG: incomplete startup packet<br>2019-11-28 13:54:04.014 UTC [15450] LOG: could not receive data from client: Connection reset by peer<br>2019-11-28 13:54:04.014 UTC [15450] LOG: incomplete startup packet<br>2019-11-28 14:01:55.514 UTC [15479] LOG: could not receive data from client: Connection reset by peer<br>2019-11-28 14:01:55.514 UTC [15479] LOG: incomplete startup packet |
In PostgreSQL12 this pair of errors will reduce to a single one like: could not receive data from client: Connection reset by peer
Many tools used across data centers are undergoing improvements for better compatibility with PostgreSQL. As I mentioned in my previous blog post, Configure HAProxy with PostgreSQL Using Built-in pgsql-check, recent improvements to HAProxy like this commit improves the disconnection of pgsql-check. So messages like LOG: could not receive data from client: Connection reset by peer may not appear in the logs anymore.
When someone wants to transmit a very large log file with a lot of these benign entries for any purpose, including external support, it may be worth removing these entries from the log file before transmitting/sharing them. For example, a simple sed command as follows could remove all startup packet related log entries:
|
1 |
sed -i '/startup packet/d' postgresql-Thu.log |
A less bloated PostgreSQL log file could be an added benefit when you upgrade to PostgreSQL 12, which is one more reason for doing so.
Resources
RELATED POSTS