If you run a container with
runc and do the following, you will get interesting results.
Even if we use the
root user with a
UID of 0, we do not have the privilege to change the hostname.
The fact that the
root user has the highest privileges is a thing of the past, as the Linux kernel introduced a new privilege checking mechanism, capabilities, in version 2.2.
Finer-grained permissions control than superuser
The traditional Linux privilege checking model is simple, as the kernel only distinguishes between two types of processes when checking privileges.
- Privileged processes with a valid user ID of 0, which is often referred to as the superuser or
- Non-privileged processes, which do not have a valid user ID of 0.
Privileged processes bypass all kernel checks directly, while non-privileged processes need to perform checks based on credentials such as the process’s valid user ID and valid user group ID.
In order to accommodate more complex privileges, the Linux kernel from version 2.2 onwards has been able to further break down superuser privileges into fine-grained units called capabilities; for example, capability
CAP_CHOWN allows the user to make arbitrary changes to the UID and GID of a file by executing the
chown command. Almost all superuser-related privileges have been broken down into separate capabilities.
The introduction of capabilities has the following benefits.
- Removing some capabilities from the superuser’s privileges to weaken them and improve system security.
- The ability to grant some special privileges to ordinary users very precisely on demand.
Security risks of privileged containers
Containers isolate processes and resources by namespace, but not all resources can be namespaced, containers and hosts are not completely isolated, for example, time is shared in containers and hosts. If the process in the container has all the privileges, it can run direct access to the hardware (malicious) programs or even directly modify the host’s file system, so it is necessary to impose certain restrictions on the operation in the container, otherwise it will affect the stability of the host, and even bring serious security risks.
For the above reasons, by default the container runs with a whitelist of capabilities added to the container at the time of creation, so that even if you are a super user in the container does not have permission to perform specific operations.
Let’s deepen our understanding of capabilities in containers with an example.
We will use an additional tool library
libcap in the container to interact with capabilities, which needs to be installed in a
filesystem bundle, as described in the previous article, in the following way.
Then you can use
runc run to run a base container with the library installed from the
Add capabilities when creating containers
In the opening example, we were unable to set the hostname in the container with the
root user because the capability
CAP_SYS_ADMIN was missing, which is not included in the whitelist of capabilities added to the container by default.
In a previous article, we described that the container runtime sets the runtime parameters and execution environment for the container it creates based on the
config.json in the
bundle, a process that also includes setting the capabilities of the processes in the container.
config.json and adding
"CAP_SYS_ADMIN" to the
effective lists of the
process.capabilities object in JSON, this capability will be added to the container
init process to the corresponding capabilities set.
Technical details of capabilities
capabilities can be applied to both files and processes (or threads, the Linux kernel does not distinguish between processes and threads), the capabilities of a file are stored in the extended attributes of the file, which are cleaned up when the image is built, so we basically do not need to consider the capabilities of a file in the container.
The capabilities of a process are controlled by five capability sets maintained separately for each process, each of which contains zero or more capabilities.
- Permitted: a superset of capabilities that the process can use
- Inheritable: capabilities that can be inherited by new derived processes when the process executes the
- Effective: the set used by the kernel to perform permission checks on processes
- Bounding: A superset of the Inheritable set, a capability must be in the Bounding set to be added to Inheritable
- Ambient: capabilities that will be retained by unprivileged programs when executing the
As shown above we have added
CAP_SYS_ADMIN to the Permitted, Bounding and Effective sets of the
init process, so the
init process will pass the kernel’s check for
Next we run a new container based on the new
config.json, and now we can change the hostname.
We are in the
sh process that is the container
init process when we do the above. If we continue to create new processes in the container, will they also have the newly added capability?Let’s try this by executing the following command in a new window.
The hostname change was successful because the newly created process exactly replicates the capabilities of the
Adding capabilities at container runtime
In addition to modifying
config.json to add capabilities, we can also add capabilities during the container runtime phase.
config.json, then run a new container
mybox3 and make sure it no longer has
CAP_SYS_ADMIN in the new
Then create a new process in that container with
runc exec and add
CAP_SYS_ADMIN to that process with the
The idea is that since
runc can set the capabilities set for the
init process based on
config.json, it can do the same for other processes running in the container.
Check the capabilities of the process
capsh --print from within the container to get more information about capabilities.
This command prints the capabilities of the current process.
cap_sys_admin we added via
config.json is included in the
Bounding set. The
+eip at the end of the capability means that the capability exists in the Effective, Inheritable, and Permitted sets.
First get the PID of the running process in the container at the host.
pscap program in the host computer.
Based on the obtained PID, see the capabilities of the processes in the container.