The cgroup presented in this article is based on the v1 version.

systemd

For an operating system, just getting the kernel up and running is useless; the init system must initialize the operating system into an operational state, and the familiar systemd acts as the init system in most Linux distributions today.

systemd is the newest init system in linux, and its main design goal is to overcome the inherent shortcomings of its predecessor, sysvinit, and increase the speed of the system boot. Although in our daily work we only know systemd in systemctl start/stop /, systemd is actually very powerful and complex, providing a lot of basic functions such as cgroups management, automounting, logging services, etc.

systemd

Unit

During system initialization systemd needs to handle a lot of work, such as mounting the file system, starting the ssh service, etc. To facilitate the management of these steps, the configuration unit unit is introduced into systemd.

Currently, systemd supports the following 12 different types of units.

  • service: denotes a background service process managed by systemd, such as docker, and is the most commonly used unit type.
  • target: represents a logical grouping of other units, where the user can control a group of units through a target. systemd predefines a large number of targets, such as multi-user.target.
  • scope: External process created by systemd-run or by calling the systemd interface.
  • slice: Represents a set of scope/service. systemd maps a node in the cgroup Hierarchy via slice.
  • socket: encapsulates a socket, each socket unit has a corresponding service unit.
  • device: Encapsulates a Linux device.
  • mount: encapsulates a mount point of the file system, systemd will monitor and manage the mount point, such as starting an automount, etc.
  • automount: Similar to the mount unit, systemd automatically performs mount actions when the mount point is accessed.
  • swap: Similar to mount unit, but manages swap.
  • timer: Timed task configuration, replaces crond
  • snapshot: The snapshot unit manages a set of configuration units that hold the current operational state of the system.
  • path: A file or directory on the system.

Each unit has a corresponding configuration file, usually named xxxx.<unit-type>, which you can place in the system directory in /etc/systemd/ and /usr/lib/systemd/.

The following documentation is provided by systemd man for reference.

The user can open the corresponding documentation in the Linux environment by using the man command, for example, through man systemd.timer we can query the specific configuration information of the timer unit.

CGroup

To describe the relationship between cgroup and systemd, let’s briefly review a few concepts in cgroup.

  • Task: represents a process in the system. cgroup places restrictions on task on its cpu, mem, and other resources.
  • Subsystem: A subsystem that represents a resource scheduling controller. cgroup supports cpu, memory, and other subsystems.
  • Controller Group: A controller group that sets limits on one or more resources, and is the basic unit of resource control for a group. task can join a controller group or migrate from one controller group to another.
  • Hierarchy: A tree structure composed of a series of Controller Groups.

Hierarchy is a confusing concept in the above abstraction, and the Redhat CGroup Handbook compares cgroups to Linux Process, which gives us a better sense of how CGroups are organized.

In the Linux process model, each process is derived from the init process and may also derive its own child processes, which inherit environment variables and other attributes from the parent process. All processes form a “process tree”, with the init process at the root and all other processes as a node in the tree. cgroups are organized similarly to the Linux process model, with a “cgroup tree” in which Each node in the tree is a Controller Group, and the child control groups inherit the properties of the parent node, which we call the “cgroup tree” in the system Hierarchy. In cgroup v1, there may be multiple Hierarchy in the system, and each Hierarchy is bound to one or more Subsystems, and users can create control groups in these Hierarchy, and add tasks to the control groups to achieve resource restrictions.

Since cgroups are multi-Hierarchy structures, the following relationship restrictions exist in cgroup v1 to simplify implementation (ref):

  • Hierarchy can be associated with multiple subsystems.
  • A single subsystem can be associated to multiple Hierarchy, provided that only that one subsystem is associated to those Hierarchy.
  • Task can be in different Hierarchy at the same time, but not in different control groups of the same Hierarchy at the same time. For any newly created Hierarchy, all tasks are in its default control group.
  • child processes are in the same control group as the parent process, but the user can move it to another control group at any time.

systemd and cgroup

When the Linux init system evolved to systemd, systemd merged with cgroups. systemd implements resource management for service.unit and scope.unit through the power of cgroups.

During the boot phase, systemd creates the following default Hierarchy for the system.

default Hierarchy

The default Hierarchy is all mounted in the /sys/fs/cgroup directory. All Hierarchy except systemd are associated with one or more subsystems. systemd is a special Hierarchy that does not mount any subsystem.

Since cgroups are managed in a tree-based structure, systemd has introduced the concept of slice.unit in order to fit easily into this system. Each slice.unit has a corresponding node in /sys/fs/cgroup/systemd, and any number of service.units and scope.units can be added, each unit and scope is a leaf node in /sys/fs/cgroup/systemd.

With systemd slice all units form a tree, and when the user needs to restrict the resources of a unit, systemd takes the tree structure of /sys/fs/cgroup/systemd into the default Hierarchy where the corresponding subsystem is mounted, and modifies the corresponding parameters.

By default, systemd creates the following slice, where -.slice is the node and all other slices inherit from -.slice and manage different types of processes.

  • -.slice: root
  • user.slice: user session process
  • system.slice: system service and scope processes
  • machine.slice: virtual machine processes

Users can create a custom slice by creating it in the following directory.

  • /etc/systemd/system
  • /run/systemd/system
  • /usr/lib/systemd/system

As an example, the user can use the systemd-run command to quickly place the top -b process into the specified slice location by creating the following structure.

systemd-run

1
2
3
4
# create parent.slice
systemd-run --unit=parent --slice=parent top -b
# Create children.slice, which is a child node of parent.slice
systemd-run --unit=children --slice=parent-children top -b

Of course we can also create slice by means of a configuration file, referring to the system.slice created by default, configured as follows.

1
2
3
4
5
6
7
8
9
# /usr/lib/systemd/system/system.slice

[Unit]
Description=System Slice
Documentation=man:systemd.special(7)
DefaultDependencies=no
Before=slices.target
Wants=-.slice
After=-.slice