Controlling Resource Consumption on a PostgreSQL Server Using Linux cgroup2

Multi-tenancy/co-hosting is always challenging. Running multiple PG instances could help to reduce the internal contention points (scalability issues) within PostgreSQL. However, the load caused by one of the tenants could affect other tenets, which is generally referred to as the “Noisy Neighbor” effect. Luckily, Linux allows users to control the resources consumed by each program using cgroups (Control Groups). Cgroup was featured in one of the blog posts by Vadim explaining how to use cgroups to Limit MySQL and MongoDB memory usage. However, the landscape has changed a lot in this area in recent years. cgroup2 came as a replacement for cgroup version one, addressing almost all the limitations of the architecture of version one.

We should be able to reliably use cgroup2 if the Linux Kernel Version is 5.2.0 or later. More practically, if we are running a Linux distribution of 2022 or later, your host machine will most probably be ready for cgroup2.

An easy way to check whether the Linux is using cgroup version one or two is to check the number of mounts with cgroup

$ grep -c cgroup /proc/mounts
1

1 2	$ grep -c cgroup /proc/mounts 1

If the count is one, then we have cgroup2. Because cgroup2 has a unified, single hierarchy, we may see multiple mounts if cgroup version 1 is in effect.

If the kernel version is new, still the cgroup1 is in effect, you may have to use the boot parameter: “systemd.unified_cgroup_hierarchy=1”. On Redhat/OEL systems, we can add this parameter by executing the following

sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"

1	sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"

Basically, it adds this to the Kernel parameter as a bootloader option, like

$ cat /etc/default/grub
…
GRUB_CMDLINE_LINUX="xxxxxx systemd.unified_cgroup_hierarchy=1"
…

$ cat /etc/default/grub

…

GRUB_CMDLINE_LINUX="xxxxxx systemd.unified_cgroup_hierarchy=1"

…

This change requires a restart of the machine

After restarting, you may verify

$ sudo mount -l | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

1 2	$ sudo mount -l \| grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

Please make sure that it is mentioned as “cgroup2”

Now we shall inspect this virtual filesystem for a better understanding

[jobinaugustine@localhost ~]$ ls -l /sys/fs/cgroup/
total 0
-r--r--r--.   1 root root 0 May 27 02:10 cgroup.controllers
-rw-r--r--.   1 root root 0 May 27 02:10 cgroup.max.depth
-rw-r--r--.   1 root root 0 May 27 02:10 cgroup.max.descendants
-rw-r--r--.   1 root root 0 May 27 02:10 cgroup.procs
-r--r--r--.   1 root root 0 May 27 02:10 cgroup.stat
-rw-r--r--.   1 root root 0 May 27 02:10 cgroup.subtree_control
-rw-r--r--.   1 root root 0 May 27 02:10 cgroup.threads
-rw-r--r--.   1 root root 0 May 27 02:10 cpu.pressure
-r--r--r--.   1 root root 0 May 27 02:10 cpuset.cpus.effective
-r--r--r--.   1 root root 0 May 27 02:10 cpuset.mems.effective
-r--r--r--.   1 root root 0 May 27 02:10 cpu.stat
drwxr-xr-x.   2 root root 0 May 27 02:10 init.scope
-rw-r--r--.   1 root root 0 May 27 02:10 io.pressure
-r--r--r--.   1 root root 0 May 27 02:10 io.stat
drwxr-xr-x.   2 root root 0 May 27 02:10 machine.slice
-r--r--r--.   1 root root 0 May 27 02:10 memory.numa_stat
-rw-r--r--.   1 root root 0 May 27 02:10 memory.pressure
-r--r--r--.   1 root root 0 May 27 02:10 memory.stat
-r--r--r--.   1 root root 0 May 27 02:10 misc.capacity
drwxr-xr-x. 107 root root 0 May 27 02:10 system.slice
drwxr-xr-x.   3 root root 0 May 27 02:16 user.slice

[jobinaugustine@localhost ~]$ ls -l /sys/fs/cgroup/

total 0

-r--r--r--. 1 root root 0 May 27 02:10 cgroup.controllers

-rw-r--r--. 1 root root 0 May 27 02:10 cgroup.max.depth

-rw-r--r--. 1 root root 0 May 27 02:10 cgroup.max.descendants

-rw-r--r--. 1 root root 0 May 27 02:10 cgroup.procs

-r--r--r--. 1 root root 0 May 27 02:10 cgroup.stat

-rw-r--r--. 1 root root 0 May 27 02:10 cgroup.subtree_control

-rw-r--r--. 1 root root 0 May 27 02:10 cgroup.threads

-rw-r--r--. 1 root root 0 May 27 02:10 cpu.pressure

-r--r--r--. 1 root root 0 May 27 02:10 cpuset.cpus.effective

-r--r--r--. 1 root root 0 May 27 02:10 cpuset.mems.effective

-r--r--r--. 1 root root 0 May 27 02:10 cpu.stat

drwxr-xr-x. 2 root root 0 May 27 02:10 init.scope

-rw-r--r--. 1 root root 0 May 27 02:10 io.pressure

-r--r--r--. 1 root root 0 May 27 02:10 io.stat

drwxr-xr-x. 2 root root 0 May 27 02:10 machine.slice

-r--r--r--. 1 root root 0 May 27 02:10 memory.numa_stat

-rw-r--r--. 1 root root 0 May 27 02:10 memory.pressure

-r--r--r--. 1 root root 0 May 27 02:10 memory.stat

-r--r--r--. 1 root root 0 May 27 02:10 misc.capacity

drwxr-xr-x. 107 root root 0 May 27 02:10 system.slice

drwxr-xr-x. 3 root root 0 May 27 02:16 user.slice

This is the root control group. All Slices come under this. We can see “system.slice” and “user.slice,” which appear as directories because they are the next levels.

We can check what are the cgroup controllers available in the machine as follows:

[jobinaugustine@localhost ~]$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

1 2	[jobinaugustine@localhost ~]$ cat /sys/fs/cgroup/cgroup.controllers cpuset cpu io memory hugetlb pids rdma misc

Putting cgroup2 into practice

Creating a slice

Creating a separate slice for the PostgreSQL instances is a good idea when there are multiple instances. This will allow us to control the overall consumption of resources from a higher level.
Let’s assume that we want to restrict all PostgreSQL services from exceeding 25% of the machine’s CPU. The first step is to create a slice:

sudo systemctl edit --force postgres.slice

1	sudo systemctl edit --force postgres.slice

For the demonstration, I am adding the following unit configuration:

[Unit]
Description=PostgreSQL Slice
Before=slices.target
[Slice]
MemoryAccounting=true
MemoryLimit=2048M
CPUAccounting=true
CPUQuota=25%
TasksMax=4096

[Unit]

Description=PostgreSQL Slice

Before=slices.target

[Slice]

MemoryAccounting=true

MemoryLimit=2048M

CPUAccounting=true

CPUQuota=25%

TasksMax=4096

Save and quit the editor, and then reload.

sudo systemctl daemon-reload

1	sudo systemctl daemon-reload

Anytime we shall check the status of the slice like sudo systemctl status postgres.slice

Modifying PostgreSQL service

We can use the slice that we created in the PostgreSQL service. For which we need to edit the service unit:

$ sudo systemctl edit --full postgresql-16

1	$ sudo systemctl edit --full postgresql-16

Add a specification about the slice, like Slice=postgres.slice, under the [Service] section of the unit file.

...
[Service]
Type=notify

User=postgres
Group=postgres
Slice=postgres.slice
...

...

[Service]

Type=notify

User=postgres

Group=postgres

Slice=postgres.slice

...

Save and exit the editor. This change requires a restart of the PostgreSQL service.

On restarting the PostgreSQL service, PostgreSQL will start running under the new slice.

$ systemd-cgls | grep post
├─postgres.slice
│ └─postgresql-16.service
│   ├─3760 /usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/
│   ├─3761 postgres: logger
│   ├─3762 postgres: checkpointer
│   ├─3763 postgres: background writer
│   ├─3765 postgres: walwriter
│   ├─3766 postgres: autovacuum launcher
│   └─3767 postgres: logical replication launcher
│ 	└─3770 grep --color=auto post

$ systemd-cgls | grep post

├─postgres.slice

│ └─postgresql-16.service

│ ├─3760 /usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/

│ ├─3761 postgres: logger

│ ├─3762 postgres: checkpointer

│ ├─3763 postgres: background writer

│ ├─3765 postgres: walwriter

│ ├─3766 postgres: autovacuum launcher

│ └─3767 postgres: logical replication launcher

│ └─3770 grep --color=auto post

The same will be visible in service status.

Verification

I tried to create a heavy load on the system by running benchmark suit having many sessions in parallel on a single CPU machine. Irrespective of whatever I tried, the Linux was restricting the PostgreSQL from exceeding the limits specified by the slice.

If we add up all CPU utilization of all the processes of PostgreSQL, we shall see 2.3*4+2*7+1.7 = 24.9! (To make the counting easier, I used a single CPU core machine.)

The same workload without any cgroup restrictions can bring the server to 100% utilization (0% idle).

*cgroup slice restrictions have reduced the throughput, which is expected.

We can have multiple services in a slice, which will be the next level in the hierarchy. systemd-cgtop can show us the slice-wise and individual service-wise utilization.

Super cool, isn’t it? The quick demo concludes here.

Service level control

cgroup2 is very versatile, and many more options exist. For example, you may not want to create separate slices for PostgreSQL services, as demonstrated, especially when there is only one PostgreSQL instance on the host machine. By default, PostgreSQL and all services will be part of “system.slice”. The easy method in this case will be to specify the cgroup restrictions at the service level rather than at the slice level.

For example:

sudo systemctl edit --full postgresql-16

1	sudo systemctl edit --full postgresql-16

And specify the resource control configuration directly in the service unit under [Service] section.

...
[Service]
User=postgres
Group=postgres

CPUAccounting=true
CPUQuota=25%
...

...

[Service]

User=postgres

Group=postgres

CPUAccounting=true

CPUQuota=25%

...

* Changes will be in effect on the next restart.

Summary

Control groups are widely and silently used these days by other programs like Docker and Kubernetes. They are one of the well-proven methods of restricting resource consumption on a machine. The new cgroup2 makes them much simpler to use.

A clear control of the resource usage on a host machine opens up many possibilities. Some of them which come to my mind are:

Better multi-tenent environments
We can prevent the “Noisy Neighbor” effect in a multi-tenant environment by preventing tenants from competing for the same set of resources.
Co-hosted application server + database server on the same machine.
The vast majority of applications are CPU-intensive, while DB servers remain Memory—and I/O-intensive. So, there are cases where putting them together on the same machine makes sense, especially for small and simple applications. A big advantage of co-hosted applications and databases is that they can communicate over local sockets rather than TCP/IP. Practically, we see many cases in which the Network is the silent performance destroyer. How To Measure the Network Impact on PostgreSQL Performance. Yet another advantage is that we don’t have to expose the database service (port) to the network.
Protect the system from abuses, service denial attacks, especially unwanted fail-overs
When the system becomes overloaded, it may become unresponsive for all the programs running on the machine, not just the database. Such situations often result in unwanted failovers by HA Frameworks. A good control of resource usage can prevent this from happening.

Our PostgreSQL Performance Tuning eBook condenses years of database expertise into a practical guide for optimizing your PostgreSQL databases. Inside, you’ll discover our most effective PostgreSQL performance strategies derived from real-world experience.

Download now: Elevate your PostgreSQL Performance

References:

https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html
https://www.redhat.com/en/blog/world-domination-cgroups-rhel-8-welcome-cgroups-v2
https://docs.oracle.com/en/learn/ol-cgroup-v2/#mount-cgroups-v2
https://www.youtube.com/watch?v=kcnFQgg9ToY
https://www.scylladb.com/2019/09/25/isolating-workloads-with-systemd-slices/
https://www.suse.com/support/kb/doc/?id=000019590

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Controlling Resource Consumption on a PostgreSQL Server Using Linux cgroup2

Putting cgroup2 into practice

Creating a slice

Modifying PostgreSQL service

Verification

Service level control

Summary

References:

Related Blog Articles

RECOMMENDED ARTICLES

Security Advisory: A Series of CVEs Affecting Valkey

Importance of Tuning Checkpoint in PostgreSQL

Rebuilding a Replica with MyDumper

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Controlling Resource Consumption on a PostgreSQL Server Using Linux cgroup2

Putting cgroup2 into practice

Creating a slice

Modifying PostgreSQL service

Verification

Service level control

Summary

References:

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

Security Advisory: A Series of CVEs Affecting Valkey

Importance of Tuning Checkpoint in PostgreSQL

Rebuilding a Replica with MyDumper

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads