Namespace is a feature provided by the Linux kernel that wraps some system resources into an abstract space and makes the processes in that space think that these resources are the only resources available in the system. It isolates processes and resources from the host system and other containers.
There are many types of namespace depending on the system resources they operate on, such as cgroup namespace, mount namespace, etc. We will just take pid namespace as an example and use runC as the container runtime implementation to demonstrate how namespace works when we perform operations on the container .
As we described in the previous article, most container systems use runC as the underlying runtime implementation, and if you are using docker on a Linux distribution, you don’t even need to install it specifically to use the runc command.
Preparation
filesystem bundle
runC can only execute containers from a filesystem bundle (a filesystem bundle is, as the name implies, a folder that satisfies a specific structure), but we can use docker to prepare an available bundle.
At this point, the entire bundle directory structure is as follows.
System monitoring tools
To complete the demo, we need some third-party system monitoring tools as an aid.
-
monitor the process startup to get the PID of the running process in the container, such as
forkstatin ubuntu, which can monitor system calls likefork(),exec()andexit()in real time, installed as follows.1$ apt install forkstat -
View namespace information, such as cinf, which is a command line tool that can easily list all namespaces on the system or view detailed information about a namespce, is installed as follows.
Running containers with runc
First we need to run forkstat in a window.
|
|
Then create a new terminal window, switch to the /mycontainer directory, and use runC to run the container.
|
|
When executed, it will go directly to the newly created container and run the ps command.
The forkstat window will have the following output.
As you can tell from the synchronous printout, the sh or ps output by ps and forkstat are actually the same process, but since the processes in the container are in a separate pid namespace, they have separate PIDs in the container, and they think they are the only processes in the container, so the PIDs will start at 1.
Find the namespace the process belongs to
Now to find the pid namespace used by the container, you need to adjust the output format of the ps command for this purpose.
PIDNS is the pid namespace, the above command can get sh process with PID 33052 belongs to the pid namespace 4026532395. Since we already have the PID of the process in the container, we can actually get all the namespace of the process through the /proc file system of the host.
|
|
The printout shows the namespace to which a process belongs.
- Each namespace is a soft link, and the name of the soft link indicates the type of namespace, e.g. cgroup for cgroup namespace, pid for pid namespace.
- Each softlink points to the real namespace object to which the process belongs, which is represented by an
inodenumber, and eachinodenumber is unique in the host system. - If two processes have softlinks of the same type of namespace pointing to the same
inode, they belong to the same namespace.
Virtually all processes will belong to at least one namespace, and the Linux system creates a default namespace for all types of processes at boot time.
We can also try to get the namespace that sh belongs to within the container, which requires the PID 1 within the container.
|
|
Watching processes in namespace
We will now look at all the processes in the pid namespace from the namespace’s point of view, which is not provided by the Linux system, so you will need to use the cinf tool installed above.
|
|
Currently there is only one process in this namespace, and this process is also the init process of the container we are creating. When a new container is created, some new namespaces will be created and the container’s init process will be added to these namespaces.
For pid namespace, all processes running in the container can only see other processes in the same pid namespace, pid:[4026532395]. The sh process is considered to be the first process running on the system in the container with a PID of 1, but in the host it is just a normal process with a PID of 33052, and the same process has different PIDs in different namespaces, which is the role of the pid namespace. In a way, a container means a new set of namespaces.
Create a new process in a container
Create a new terminal window to run a new process in an already running container.
|
|
From the forkstat window, we can see the PID of the newly created process.
There is actually a more direct way to see the processes running in the container from the host, we can use the ps subcommand provided by runC.
Next, you still use cinf to find out which namespace the newly created process belongs to.
From the result, no new namespace is created, the namespace of the 32608 process is exactly the same as the namespace to which the init process-sh of the mybox container belongs. That is, creating a new process in the container simply adds that process to the namespace of the container’s init process.
Here is a list of all the processes owned by the 4026532395 namespace.
|
|
If we run ps -ef inside the container, we can also see these processes, their PIDs will be different due to the pid namespace.
Now we know that docker/runc exec is actually running a new process in the namespace of the created container.
Summary
When you run a container, new namespaces are created and the init process is added to those namespaces; when you run a new process in a container, the new process is added to the namespace created when the container was created.
In fact, the behavior of creating new namespaces when creating a container can be changed, we can specify that the new container uses the existing namespace.