One of the questions you often will be faced with operating a Linux-based system is managing memory budget. If a program uses more memory than available you may get swapping to happen, oftentimes with a terrible performance impact, or have Out of Memory (OOM) Killer activated, killing process altogether.
Before adjusting memory usage, either by configuration, optimization, or just managing the load, it helps to know how much memory a given program really uses.
If your system runs essentially a single user program (there is always a bunch of system processes) it is easy. For example, if I run a dedicated MySQL server on a system with 128GB of RAM I can use “used” as a good proxy of what is used and “available” as what can still be used.
root@rocky:/mnt/data2/mysql# free -h
total used free shared buff/cache available
Mem: 125Gi 88Gi 5.2Gi 2.0Mi 32Gi 36Gi
Swap: 63Gi 33Mi 63Gi
There is just swap to keep in mind but if the system is not swapping heavily, even there is swap space used, it usually keeps “unneeded junk” which does not need to factor in the calculation.
If you’re using Percona Monitoring and Management you will see it in Memory Utilization:
And in the Swap Activity graphs in the “Node Summary” dashboard:
If you’re running multiple processes that share system resources, things get complicated because there is no one-to-map mapping between “used” memory and process.
Let’s list just some of those complexities:
- Copy-on-Write semantics for “fork” of the process – All processes would share the same “used” memory until process modifies data, only in this case it gets its own copy.
- Shared memory – As the name says, shared memory is a memory that is shared across different processes.
- Shared Libraries – Libraries are mapped into every process which uses it and are part of its memory usage, though they are shared among all processes which use the same library.
- Memory Mapped Files and Anonymous mmap() – There are a lot of complicated details here. For an example, check out Memory Mapped Files for more details. This discussion on StackExchange also has some interesting bits.
With that complexity in mind let’s look at the “top” output, one the most common programs to look at current load on Linux. By default, “top” sorts processes by CPU usage so we’ll press “Shift-M” to sort it by (resident) memory usage instead.
The first thing you will notice is that this system, which has only 1GB of physical memory, has a number of processes which has virtual memory (VIRT) in excess of 1GB.
For various reasons, modern memory allocators and programming languages (i.e. GoLang) can allocate a lot of virtual memory which they do not really use so virtual memory usage has little value to understand how much real memory a process needs to operate.
Now there is resident memory (RES) which shows us how much physical memory the process really uses. This is good… but there is a problem. Memory can be non-resident either because it was not really “used” and exists as virtual memory only, or because it was swapped out.
If we look into the stats kernel actually provides for the process we’ll see there is more data available:
root@PMM2Server:~# cat /proc/3767/status
State: S (sleeping)
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
NStgid: 3767 17
NSpid: 3767 17
NSpgid: 3767 17
NSsid: 3698 1
VmPeak: 3111416 kB
VmSize: 3111416 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 608596 kB
VmRSS: 291356 kB
RssAnon: 287336 kB
RssFile: 4020 kB
RssShmem: 0 kB
VmData: 1759440 kB
VmStk: 132 kB
VmExe: 26112 kB
VmLib: 8 kB
VmPTE: 3884 kB
VmSwap: 743116 kB
HugetlbPages: 0 kB
VmSwap is a particularly interesting data point, as it shows the amount of memory used by this process which was swapped out.
VmRSS+VmSwap is a much better indication of the “physical” memory the process needs. In the case above, it will be 1010MB, which is a lot higher than 284MB of resident set size but also a lot less than the 3038MB “virtual memory” size for this process.
The problem with the swapped out part, though we do not know if it was swapped out “for good” being dead weight, is, for example, some code or data which is not used in your particular program use case or if it was swapped out due to memory pressure, and we really would need to have it in real memory (RAM) for optimal performance – but we do not have enough memory available.
The helpful data point to look at in this case is major page faults. It is not in the output above but is available in another file – /proc/[pid]/stat. Here is some helpful information on Stack Overflow.
A high number of major page faults indicates the stuff program needs is not in physical memory. Swap activity will be included here but also references to currently unmapped code in the shared library or references to the data in memory-mapped files, which is currently not in RAM. In any case, a high rate of major page faults often indicates RAM pressure.
Let’s go to our friend “top”, though, and see if we can get more helpful information displayed in it. You can use the “F” keyboard shortcut to select fields you want to be displayed.
You can add SWAP, Major Faults Delta, and USED columns to display all the items we spoke about!
Looking at this picture we can see a large portion of “prometheus” process is swapped out and it has 2K major page faults/sec happening, pointing out the fact it is likely suffering.
The “clickhouse-serv” process is another interesting example, as it has over 4G “resident size” but has relatively little memory used and a lot less major page faults.
Finally, let’s look at “percona-qan-api” process which has a very small portion swapped out but shows 2K major page faults as well. I’m honestly not quite sure what it is, but it does not seem to be swap-IO related.
Want to see how much memory process is using? Do not look at virtual memory size or resident memory size but look at “used” memory defined as resident memory size + swap usage. Want to see if there is actual memory pressure? Check out system-wide swap input/output statistics, as well as major page fault rate, for a process you’re investigating.