Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Removing PostgreSQL Bottlenecks Caused by High Traffic

May 29, 2020

Author

Robert Bernier

Insight for DBAs

PostgreSQL

Share this Post:

PostgreSQL Bottlenecks High Traffic Taking our cue from Peter Zaitsev’s article on MySQL performance bottlenecks, I am going to talk about the PostgreSQL perspective of removing bottlenecks caused by high traffic.

Many stacks today are implemented by trusting the Object Relational Mapper, or ORM, to do the right thing with PostgreSQL while the critical business logic is created on the application server side. For the most part, this works well. Over time, however, the database server must be revisited as it scales. As reliable as PostgreSQL is, it can experience slowdowns as traffic increases.

There are many ways to address performance bottlenecks. For this discussion, we will look at the following areas:

Tuning performance parameters
Session connections
Bloat
Autovacuum: basic
Autovacuum: advanced
Data hotspots
Competing application processes
Replication latency
Server environment

About Monitoring and Statistics

Before we get too deep into the weeds, a word about monitoring. Contrary to what some people may think, metrics, values, and observations are not pulled out of thin air. A rigorous approach to metrics monitoring is essential for identifying bottlenecks and their remedies. Good advice means little without context. To help with this process, we have introduced a new tool called pg_stat_monitor.

Think of it as pg_stat_statements on steroids. It offers more granularity and, most significantly, introduces the concept of buckets, otherwise known as moving averages.

A more detailed blog dedicated exclusively to this technology is forthcoming. In the meantime, here is an abridged set of instructions for installing this extension into your database.

Method 1: The Percona Distribution for PostgreSQL

apt update
apt upgrade -y

apt install -y wget gnupg2 lsb-release curl
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
dpkg -i percona-release_latest.generic_all.deb

percona-release setup ppg-12
apt install -y percona-postgresql-12 percona-pg-stat-monitor12

apt update

apt upgrade -y

apt install -y wget gnupg2 lsb-release curl

wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb

dpkg -i percona-release_latest.generic_all.deb

percona-release setup ppg-12

apt install -y percona-postgresql-12 percona-pg-stat-monitor12

Method 2: Compile and Install (Community Postgres Repository)

# The following assumes the community version of Postgres is already installed.

apt install -y postgresql-server-dev-12 git make gcc
git clone https://github.com/percona/pg_stat_monitor

cd pg_stat_monitor
make USE_PGXS=1
make USE_PGXS=1 install

# The following assumes the community version of Postgres is already installed.

apt install -y postgresql-server-dev-12 git make gcc

git clone https://github.com/percona/pg_stat_monitor

cd pg_stat_monitor

make USE_PGXS=1

make USE_PGXS=1 install

Creating the Extension

ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_monitor';
CREATE EXTENSION pg_stat_monitor;

1 2	ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_monitor'; CREATE EXTENSION pg_stat_monitor;

Restart the Server

systemctl restart postgresql

1	systemctl restart postgresql

About Categories and Potential Impact

Complexity refers to the level of difficulty involved in implementing a particular solution. Potential impact gives you an idea of the mitigation’s ideal impact on your system’s performance. Sometimes, because of a system’s age, type, technical debt, or other factors, accurately describing complexity and potential impact can be difficult. In highly complex environments, your judgment makes the final call.

Categories:

Complexity
- Low
- Medium
- High
- Low-Medium-High
Potential Impact
- Low
- Medium
- High
- Low-Medium-High

Tuning Performance Parameters

Complexity: Low
Potential Impact: High

There was a time, not too long ago, when a modern version of Postgres could still run on an i386. Although the default settings have since been updated, the parameters are still set to use the least amount of resources upon installation.

These settings are the easiest to update and are normally adjusted when the service is first installed. Not adjusting these values can result in high CPU and I/O:

effective_cache_size: approximately 50% to 75%
shared_buffers: approximately 1/4 to 1/3 of total system RAM
work_mem: approximately 10MB

Let’s talk about these variables.

The recommended value for effective_cache_size, although typical, can be set more precisely by referring to top, specifically free + cached RAM.

Setting shared_buffers is an interesting puzzle. There are two ways to look at this setting. If you have a small database, you can set shared buffers high enough that you essentially have a RAM-resident database system. Otherwise, you configure the tables and indexes used most often by the service to remain in RAM, following the old 80/20 rule. Setting shared buffers to 1/3 of system RAM used to be the recommended value, but over time it was reduced to 1/4 as machines acquired more RAM. There is such a thing as assigning too much to shared buffers. Too little RAM means more CPU work and higher I/O. You will know the shared buffer setting is too high when CPU load and I/O performance reach a plateau.

Postgres traffic

Another factor to consider is the OS cache. Given enough RAM, Linux will cache tables and indexes in RAM and can, depending on various settings, make PostgreSQL believe it is reading from disk rather than RAM. Performance improves at the expense of increased redundancy, often by copying the same pages found in shared buffers into the OS cache. This is another reason to avoid an overly large shared buffer cache. For the perfectionists among us, take a look at the pg_buffercache extension, which reads cache usage in real time. Tip: look at this table.

Setting work_mem too low guarantees poor performance because sorts are processed as temporary files on disk. On the other hand, setting it too high risks starving the server of RAM if too many connections are active at one time. A chunk of RAM equal to work_mem is used for each sort operation. You will need to do some arithmetic by counting the instances of RAM used by each query and session. Tip: use EXPLAIN ANALYZE to see where sort operations are carried out. By varying the value in a given session, you can see when it spills to disk.

You can learn quite a bit by benchmarking the system at discrete variable settings.

Session Connections: Managing

Complexity: Low
Potential Impact: Low-Medium-High

High traffic is often characterized by large numbers of connections made over short intervals. Too many connections block processes, delay query responses, and can even cause session errors. Without exception handling, root cause analysis may require digging into the Postgres logs.

An easy fix is increasing the number of connections:

# postgresql.conf: default is set to 100.
max_connections

1 2	# postgresql.conf: default is set to 100. max_connections

PostgreSQL instances

Alternatively, a more sophisticated approach is connection pooling. There are numerous solutions, but the most commonly used technology is PgBouncer. Among its many capabilities, PgBouncer can regulate connection sessions using one of three modes:

Session pooling: The most polite method. When a client connects, a server connection is assigned to it for the entire duration of the client connection. When the client disconnects, the server connection is returned to the pool. This is the default method.
Transaction pooling: A server connection is assigned to a client only during a transaction. When PgBouncer notices that the transaction is over, the server connection is returned to the pool.
Statement pooling: The most aggressive method. The server connection is returned to the pool immediately after a query completes. Multi-statement transactions are disallowed in this mode because they would break.

Secure Socket Layer, or SSL, is another consideration. When configured to use SSL certificates, PostgreSQL’s default behavior encourages all connecting sessions to use SSL, which consumes more CPU processing power than an unencrypted session. You can configure host-based authentication rules in pg_hba.conf to force ordinary client sessions not to use SSL and reserve SSL for administrative tasks by the superuser or for streaming replication.

Autovacuum: Basic

Complexity: Medium
Potential Impact: Low-Medium

Multi-Version Concurrency Control is one of the foundational principles that makes PostgreSQL a popular DBMS solution. However, one downside is that every updated or deleted record creates a dead tuple that must eventually be purged. An improperly tuned autovacuum process, which is the mechanism that handles dead tuples, reduces performance. The busier the server, the more significant the effect.

You manage the autovacuum daemon using these three parameters:

autovacuum_max_workers: Increasing the number of autovacuum workers from the default of three means more processes are available to vacuum the data cluster. This is especially useful when dealing with many very large tables. Ideally, create one worker per CPU. Workers should never exceed the number of CPUs, but too many can cause CPU usage spikes. Typically, the correct value is somewhere between these limits. This is a balancing act between maximizing autovacuum efficiency and preserving overall system performance.
maintenance_work_mem: The greater the value, the more efficient the vacuuming. Keep in mind that there is a law of diminishing returns. Too large a value is, at best, a waste of RAM and, at worst, can exhaust available RAM for the entire database system.
autovacuum_freeze_max_age: This parameter mitigates TXID wraparound. The older the age, the less often it runs, which reduces system load. But as with all autovacuum parameters mentioned here, there is a caveat. Delay the value too long and you risk running out of TXID numbers before the process completes, causing the server to force a shutdown in order to protect data integrity. Determining the correct value requires trending the largest or oldest TXID against the autovacuum process when querying pg_stat_activity for wraparound activity.

Beware of over-committing RAM and CPU. The higher the initial value, the greater the risk that consumed resources will be exhausted as system load increases. If values are set too high, you can experience a sudden drop in performance once load exceeds a certain point. Tip: similar to RAM allocation for work_mem, you can either perform some arithmetic or benchmark the environment to set the values optimally.

Autovacuum: Advanced

Complexity: High
Potential Impact: High

Because of the effort involved, consider this method when the database system risks pushing the host to its physical limits and excessive bloat has been identified as an issue.

Editing the autovacuum runtime parameters in postgresql.conf is the most common method used to control behavior for a data cluster. Unfortunately, this one-size-fits-all approach may not work well over the long term, especially as a system scales.

Table storage parameters: Often, some tables represent a significant amount of the total data cluster churn. Adjusting autovacuum parameters on a table-by-table basis is an excellent way to mitigate hyperactive relations without resorting to a manual invocation of VACUUM, which can significantly affect the system.

Tune individual tables using this SQL command.

ALTER TABLE ... SET STORAGE_PARAMETER

1	ALTER TABLE ... SET STORAGE_PARAMETER

Bloat

Complexity: Low
Potential Impact: Medium-High

Over time, even with the best intentions, performance can degrade due to inadequate vacuuming policies that cause excessive bloat. In some cases, even tuning the autovacuum daemon and manually invoking VACUUM will not easily resolve the issue. For these cases, the pg_repack extension comes to the rescue.

pg_repack: rebuilds and reorganizes tables and indexes under production conditions.

Data Hotspots

Complexity: High
Potential Impact: Low-Medium-High

Similar to MySQL hotspots, the PostgreSQL experience, and the resolution of hotspots, relies on extensive knowledge of data flow. At its most extreme, mitigation can require refactoring the system’s architecture.

Here are a few popular mitigation techniques:

Indexes: Confirming that criteria columns have indexes assigned to them is a major opportunity for improving query performance. Another technique is to query the various catalogs and monitoring views and confirm that SQL commands are querying columns with indexes. Tip: use tools such as the pg_stat_statements extension and pgBadger to determine query performance.
Heap-Only Tuples (HOT): There is such a thing as too many indexes. Reduce potential bloat and table size by removing unused indexes attached to columns that are not used as part of the WHERE clause in a SELECT query.
Table partitioning: Nothing affects performance quite like a table that is several times larger than the average table size. Splitting a large table into smaller partitions can increase query performance, for example when querying data partitioned by date. And because only one autovacuum worker is allowed to process a single table, breaking it into many smaller tables allows more than one worker to autovacuum it. Another advantage of partitioning is that data purges become much more efficient and faster by truncating a single partition rather than deleting many rows from one oversized table.
Parallel querying: Introduced in recent versions of Postgres, parallel querying allows multiple CPUs to process a single query where previously there was one processor per query.
Denormalization: Depending on the specifics, you can improve performance by merging columns from several tables into a larger, single table. Performance gains are made by reducing query planning, but at the expense of increased data redundancy. Consider this option carefully before using it.

PMM query analytics

Competing Application Processes

Complexity: Low
Potential Impact: High

PHP, Java, and Python applications: Avoid running applications and Postgres on the same host. In the past, a web service and RDBMS could often run on the same machine because their resource usage was complementary. Be careful with applications based on these languages because they can consume large amounts of RAM, especially during garbage collection. This competes with database systems and reduces overall performance.

Replication Latency

Complexity: Low
Potential Impact: High

Async vs. sync: The most recent versions of Postgres support logical and streaming replication in both synchronous and asynchronous modes. Although the default replication mode is asynchronous, you must consider the implications of synchronous replication, especially over network connections with less-than-ideal latency.

Server Environment

Last but not least are the most basic considerations: make the host bigger and better. Let’s review what each of the following resources brings to PostgreSQL performance.

RAM: The more, the better. More RAM allows us to assign more memory to queries and increase the number of individual sessions. More RAM also means more of the database is cached, which optimizes I/O.
CPU: More CPU means more forked processes, such as vacuuming, session connections, and related workloads.
HDD: Size and speed optimizations increase the permitted size of the database. Overall query performance improves because of faster I/O, especially when operations such as merge sorts spill onto disk.
Disk partitioning:
- Breaking the data cluster across multiple partitions increases the number of channels and isolates distinct operations that Postgres carries out at the same time. For example, you can put indexes and tables on separate partitions with different performance characteristics.
- Temporary session tables and operations such as merge sorts can be dedicated to a single high-speed partition or rotated across multiple partitions to improve I/O.
- Logging can be isolated onto a partition, so running out of space does not affect the RDBMS.
- WAL logs, similar to regular logging, can have their own partition for this write-only operation. If it runs out of space, as can happen when log shipping breaks and the connection to the standby is interrupted, the database’s integrity is assured because the tables are located elsewhere.

PostgreSQL tuple activity

PostgreSQL high traffic

Our white paper, “Why Choose PostgreSQL?”, looks at the features and benefits of PostgreSQL and presents practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.

Download PDF