Principles of container technology (3): Using Cgroups to Implement Resource Limits

cgroups (control groups) is a feature provided by the Linux kernel that limits, accounts for, and isolates the system resources (such as CPU, memory, disk I/O, network, etc.) used by a group of processes.

In the previous article we have understood the role that Namespace plays in container technology. If Namespace controls what processes in a container can see, then cgroups controls how many resources processes in a container can use. namespace enables process isolation, and cgroups enables resource limiting, which is also the basis for building containers.

In this article, we will follow the line of the Namespace article and actually create a container and observe the changes of cgroups in the host to show how cgroups works and then learn how to configure cgroups by ourselves.

When to create a cgroup

The Linux kernel provides an interface for managing cgroups through a pseudo-file system called cgroupfs. We can list existing cgroups on the system with the lscgroup command, which actually traverses the files in the /sys/fs/cgroup/ directory.

`1`	`$ lscgroup \| tee cgroup.a`

If you are using a Linux distribution that does not have the lscgroup command, you can download and install it using the command provided by command-not-found.com.

We save the output to a cgroup.a file. Next, start a container in another window following the steps in the Namespace article.

1
2

$ cd /mycontainer
$ runc run mybox

Go back to the original window and execute the lsgroup command again.

`1`	`$ lscgroup \| tee group.b`

Now compare the output of the lscgroup command twice.

$ diff group.a group.b

> perf_event:/mybox
> freezer:/mybox
> net_cls,net_prio:/mybox
> cpu,cpuacct:/user.slice/mybox
> blkio:/user.slice/mybox
> cpuset:/mybox
> hugetlb:/mybox
> pids:/user.slice/user-0.slice/session-5.scope/mybox
> memory:/user.slice/user-0.slice/session-5.scope/mybox
> devices:/user.slice/mybox

As you can see from the results, after the mybox container is created, a new cgroup of all types is created specifically for it in the system.

How cgroups control the resources of a container

A cgroup controls processes, which control how much memory/CPU/network/etc. a process or group of processes can use. A cgroup’s tasks list contains the PIDs of the processes it controls, and the tasks is actually a file in the cgroupfs.

init process

We first print out information about the processes in the container in the host, and find the container’s init process.

$ runc ps mybox

UID          PID    PPID  C STIME TTY          TIME CMD
root        2250    2240  0 15:28 pts/0    00:00:00 sh

Print arbitrary lists of tasks for some types of cgroups.

$ cat /sys/fs/cgroup/memory/user.slice/user-0.slice/session-5.scope/mybox/tasks
2250
$ cat /sys/fs/cgroup/blkio/user.slice/mybox/tasks
2250

The process is straightforward: after the container is created, the container’s init process is added to the cgroups created for that container, and we can get a more definite result with /proc/$PID/cgroup.

$ cat /proc/2250/cgroup
12:devices:/user.slice/mybox
11:memory:/user.slice/user-0.slice/session-5.scope/mybox
10:pids:/user.slice/user-0.slice/session-5.scope/mybox
9:hugetlb:/mybox
8:cpuset:/mybox
7:rdma:/
6:blkio:/user.slice/mybox
5:cpu,cpuacct:/user.slice/mybox
4:net_cls,net_prio:/mybox
3:freezer:/mybox
2:perf_event:/mybox
1:name=systemd:/user.slice/user-0.slice/session-5.scope/mybox
0::/user.slice/user-0.slice/session-5.scope

Other processes in the container

Next we run a new process in the mybox container.

1
2

# 在 mybox 容器中运行
$ top -b

See if a new cgroup will be created.

1
2

$ lscgroup | tee group.c
$ diff group.b group.c

Since a cgroup can control a group of processes, we assume that any new processes created in the running container will be added to the cgroups to which the init process belongs.

To verify this, first find the PID of the newly created process.

$ runc ps mybox
UID          PID    PPID  C STIME TTY          TIME CMD
root        2250    2240  0 15:28 pts/0    00:00:00 sh
root        2576    2250  0 15:59 pts/0    00:00:00 top -b

The PID of the new process is 2576, and then the cgroups information for the process is printed.

`1`	`cat /proc/2576/cgroup`

The output is identical to that of the PID 2250 process, and we can also print the tasks list of one of the cgroups.

1
2
3

cat /sys/fs/cgroup/blkio/user.slice/mybox/tasks
2250
2576

Exactly as expected. In fact, writing the PID of a process directly to the tasks file implements adding the process to that cgroup. When a container is created, a new cgroup is created for each type of resource, and all processes running in the container are added to these cgroups.

By controlling all processes running in the container, cgroups implements resource limits for the container.

How to configure cgroups

Here we will take the memory cgroup as an example to understand how to configure cgroup to achieve memory limitation for the mybox container.

There are two ways to configure a cgroup, either by directly modifying the specified file in cgroupfs or by using an advanced tool like runc or docker.

File system method

By means of cgroupfs, you can view/set the limits of a cgroup by viewing/modifying specific files in that cgroup’s directory.

1
2

cat /sys/fs/cgroup/memory/user.slice/user-0.slice/session-5.scope/mybox/memory.limit_in_bytes
9223372036854771712

The maximum available memory can be set by modifying the memory.limit_in_bytes file. Now we have not set any limit for this container, so the current value of the memory limit is a meaninglessly large value, and we now write the new value directly to this file.

`1`	`echo "100000000" > /sys/fs/cgroup/memory/user.slice/user-0.slice/session-5.scope/mybox/memory.limit_in_bytes`

This sets a new memory limit. After the new limit is written, all processes in the container cannot use more than 100M of memory in total, after which they will be kill or sleep processes in the container according to the OOM policy set in the memory.oom_control file.

High-level tools approach

Configuring cgroups through the path provided by the higher-level tools is a more friendly way, although the implementation behind these tools also changes cgroupfs as described above.

For runc, the config.json file in the filesystem bundle needs to be modified to configure the cgroup. setting the memory limit requires modifying the linux.resources field in the JSON object as follows.

"resources": {
    "memory": {
    "limit": 100000,
    "reservation": 200000
    },
    ...
}

For docker it’s even simpler, it’s a wrapped user-oriented tool, and the memory limit can be specified with the -memory option when executing the docker run command. This parameter is actually written to config.json and used by the runtime implementation runc, which in turn changes cgroupfs.

Table of Contents