Multi-tenancy/co-hosting is always challenging. Running multiple PG instances could help to reduce the internal contention points (scalability issues) within PostgreSQL. However, the load caused by one of the tenants could affect other tenets, which is generally referred to as the “Noisy Neighbor” effect. Luckily, Linux allows users to control the resources consumed by each program using cgroups (Control Groups). Cgroup was featured in one of the blog posts by Vadim explaining how to use cgroups to Limit MySQL and MongoDB memory usage. However, the landscape has changed a lot in this area in recent years. cgroup2 came as a replacement for cgroup version one, addressing almost all the limitations of the architecture of version one.
We should be able to reliably use cgroup2 if the Linux Kernel Version is 5.2.0 or later. More practically, if we are running a Linux distribution of 2022 or later, your host machine will most probably be ready for cgroup2.
An easy way to check whether the Linux is using cgroup version one or two is to check the number of mounts with cgroup
1 2 |
$ grep -c cgroup /proc/mounts 1 |
If the count is one, then we have cgroup2. Because cgroup2 has a unified, single hierarchy, we may see multiple mounts if cgroup version 1 is in effect.
If the kernel version is new, still the cgroup1 is in effect, you may have to use the boot parameter: “systemd.unified_cgroup_hierarchy=1”. On Redhat/OEL systems, we can add this parameter by executing the following
1 |
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1" |
Basically, it adds this to the Kernel parameter as a bootloader option, like
1 2 3 4 |
$ cat /etc/default/grub … GRUB_CMDLINE_LINUX="xxxxxx systemd.unified_cgroup_hierarchy=1" … |
This change requires a restart of the machine
After restarting, you may verify
1 2 |
$ sudo mount -l | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate) |
Please make sure that it is mentioned as “cgroup2”
Now we shall inspect this virtual filesystem for a better understanding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[jobinaugustine@localhost ~]$ ls -l /sys/fs/cgroup/ total 0 -r--r--r--. 1 root root 0 May 27 02:10 cgroup.controllers -rw-r--r--. 1 root root 0 May 27 02:10 cgroup.max.depth -rw-r--r--. 1 root root 0 May 27 02:10 cgroup.max.descendants -rw-r--r--. 1 root root 0 May 27 02:10 cgroup.procs -r--r--r--. 1 root root 0 May 27 02:10 cgroup.stat -rw-r--r--. 1 root root 0 May 27 02:10 cgroup.subtree_control -rw-r--r--. 1 root root 0 May 27 02:10 cgroup.threads -rw-r--r--. 1 root root 0 May 27 02:10 cpu.pressure -r--r--r--. 1 root root 0 May 27 02:10 cpuset.cpus.effective -r--r--r--. 1 root root 0 May 27 02:10 cpuset.mems.effective -r--r--r--. 1 root root 0 May 27 02:10 cpu.stat drwxr-xr-x. 2 root root 0 May 27 02:10 init.scope -rw-r--r--. 1 root root 0 May 27 02:10 io.pressure -r--r--r--. 1 root root 0 May 27 02:10 io.stat drwxr-xr-x. 2 root root 0 May 27 02:10 machine.slice -r--r--r--. 1 root root 0 May 27 02:10 memory.numa_stat -rw-r--r--. 1 root root 0 May 27 02:10 memory.pressure -r--r--r--. 1 root root 0 May 27 02:10 memory.stat -r--r--r--. 1 root root 0 May 27 02:10 misc.capacity drwxr-xr-x. 107 root root 0 May 27 02:10 system.slice drwxr-xr-x. 3 root root 0 May 27 02:16 user.slice |
This is the root control group. All Slices come under this. We can see “system.slice” and “user.slice,” which appear as directories because they are the next levels.
We can check what are the cgroup controllers available in the machine as follows:
1 2 |
[jobinaugustine@localhost ~]$ cat /sys/fs/cgroup/cgroup.controllers cpuset cpu io memory hugetlb pids rdma misc |
Putting cgroup2 into practice
Creating a slice
Creating a separate slice for the PostgreSQL instances is a good idea when there are multiple instances. This will allow us to control the overall consumption of resources from a higher level.
Let’s assume that we want to restrict all PostgreSQL services from exceeding 25% of the machine’s CPU. The first step is to create a slice:
1 |
sudo systemctl edit --force postgres.slice |
For the demonstration, I am adding the following unit configuration:
1 2 3 4 5 6 7 8 9 |
[Unit] Description=PostgreSQL Slice Before=slices.target [Slice] MemoryAccounting=true MemoryLimit=2048M CPUAccounting=true CPUQuota=25% TasksMax=4096 |
Save and quit the editor, and then reload.
1 |
sudo systemctl daemon-reload |
Anytime we shall check the status of the slice like sudo systemctl status postgres.slice
Modifying PostgreSQL service
We can use the slice that we created in the PostgreSQL service. For which we need to edit the service unit:
1 |
$ sudo systemctl edit --full postgresql-16 |
Add a specification about the slice, like Slice=postgres.slice, under the [Service] section of the unit file.
1 2 3 4 5 6 7 8 |
... [Service] Type=notify User=postgres Group=postgres Slice=postgres.slice ... |
Save and exit the editor. This change requires a restart of the PostgreSQL service.
On restarting the PostgreSQL service, PostgreSQL will start running under the new slice.
1 2 3 4 5 6 7 8 9 10 11 |
$ systemd-cgls | grep post ├─postgres.slice │ └─postgresql-16.service │ ├─3760 /usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/ │ ├─3761 postgres: logger │ ├─3762 postgres: checkpointer │ ├─3763 postgres: background writer │ ├─3765 postgres: walwriter │ ├─3766 postgres: autovacuum launcher │ └─3767 postgres: logical replication launcher │ └─3770 grep --color=auto post |
The same will be visible in service status.
Verification
I tried to create a heavy load on the system by running benchmark suit having many sessions in parallel on a single CPU machine. Irrespective of whatever I tried, the Linux was restricting the PostgreSQL from exceeding the limits specified by the slice.
If we add up all CPU utilization of all the processes of PostgreSQL, we shall see 2.3*4+2*7+1.7 = 24.9! (To make the counting easier, I used a single CPU core machine.)
The same workload without any cgroup restrictions can bring the server to 100% utilization (0% idle).
*cgroup slice restrictions have reduced the throughput, which is expected.
We can have multiple services in a slice, which will be the next level in the hierarchy. systemd-cgtop can show us the slice-wise and individual service-wise utilization.
Super cool, isn’t it? The quick demo concludes here.
Service level control
cgroup2 is very versatile, and many more options exist. For example, you may not want to create separate slices for PostgreSQL services, as demonstrated, especially when there is only one PostgreSQL instance on the host machine. By default, PostgreSQL and all services will be part of “system.slice”. The easy method in this case will be to specify the cgroup restrictions at the service level rather than at the slice level.
For example:
1 |
sudo systemctl edit --full postgresql-16 |
And specify the resource control configuration directly in the service unit under [Service] section.
1 2 3 4 5 6 7 8 |
... [Service] User=postgres Group=postgres CPUAccounting=true CPUQuota=25% ... |
* Changes will be in effect on the next restart.
Summary
Control groups are widely and silently used these days by other programs like Docker and Kubernetes. They are one of the well-proven methods of restricting resource consumption on a machine. The new cgroup2 makes them much simpler to use.
A clear control of the resource usage on a host machine opens up many possibilities. Some of them which come to my mind are:
- Better multi-tenent environments
We can prevent the “Noisy Neighbor” effect in a multi-tenant environment by preventing tenants from competing for the same set of resources. - Co-hosted application server + database server on the same machine.
The vast majority of applications are CPU-intensive, while DB servers remain Memory—and I/O-intensive. So, there are cases where putting them together on the same machine makes sense, especially for small and simple applications. A big advantage of co-hosted applications and databases is that they can communicate over local sockets rather than TCP/IP. Practically, we see many cases in which the Network is the silent performance destroyer. How To Measure the Network Impact on PostgreSQL Performance. Yet another advantage is that we don’t have to expose the database service (port) to the network. - Protect the system from abuses, service denial attacks, especially unwanted fail-overs
When the system becomes overloaded, it may become unresponsive for all the programs running on the machine, not just the database. Such situations often result in unwanted failovers by HA Frameworks. A good control of resource usage can prevent this from happening.
Our PostgreSQL Performance Tuning eBook condenses years of database expertise into a practical guide for optimizing your PostgreSQL databases. Inside, you’ll discover our most effective PostgreSQL performance strategies derived from real-world experience.
Download now: Elevate your PostgreSQL Performance
References:
https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html
https://www.redhat.com/en/blog/world-domination-cgroups-rhel-8-welcome-cgroups-v2
https://docs.oracle.com/en/learn/ol-cgroup-v2/#mount-cgroups-v2
https://www.youtube.com/watch?v=kcnFQgg9ToY
https://www.scylladb.com/2019/09/25/isolating-workloads-with-systemd-slices/
https://www.suse.com/support/kb/doc/?id=000019590