Problem description

When containerizing nginx, there is a common problem: How do I automatically set the number of nginx worker processes?

In the nginx.conf configuration file of the official nginx container image, there is a worker process configuration.

1
worker_processes  1;

It will configure nginx to start only 1 worker. this works well when the nginx container is 1 core.

When we want nginx to give a higher configuration, for example 4c or 16c, we need to make sure that nginx can also start the corresponding number of worker processes. there are two ways to do this.

  1. modify nginx.conf to adjust the number of worker_processes to the corresponding number of cpu cores.
  2. modify nginx.conf to change worker_processes to auto.

The first method is feasible, but the configuration file needs to be modified and nginx needs to be reloaded. nginx.conf must be mounted as a configuration file when actually deployed, which is a heavy mental burden for some people who are not familiar with nginx.

The second method, on Kubernetes will encounter some problems. By observing in the container, we can find that the worker process started by nginx does not follow the limit we set for Pod, but is consistent with the number of cpu cores of the node where Pod is located.

This can bring about obvious slow response problems when the host has more cpu cores and the Pod has a smaller cpu configuration because each worker is allocated fewer time slices.

Cause of the problem

We know that when Kubernetes configures cpu limits for containers to 2, the containers are not really “allocated” 2 cpus, but are limited by cgroups.

1
2
3
4
5
6
7
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 500m
            memory: 256Mi

Let’s go to the host where this Pod is located and check the relevant information.

1
2
3
4
5
6
7
8
9
# docker inspect 17f5f35c3500|grep -i cgroup
            "Cgroup": "",
            "CgroupParent": "/kubepods/burstable/podb008ccda-9396-11ea-bc20-ecf4bbd63ee8",
            "DeviceCgroupRules": null,
# cd /sys/fs/cgroup/cpu/kubepods/burstable/podb008ccda-9396-11ea-bc20-ecf4bbd63ee8
# cat cpu.cfs_quota_us
200000
# cat cpu.cfs_period_us
100000

As you can see, the actual number of cpu cores that the Pod can use is limited by cpu.cfs_quota_us/cpu.cfs_period_us.

But nginx’s worker_processes are queried by sysconf(_SC_NPROCESSORS_ONLN) for the number of cpu’s on the host (getconf _NPROCESSORS_ONLN), so let’s look at this process through strace.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# strace getconf _NPROCESSORS_ONLN
execve("/bin/getconf", ["getconf", "_NPROCESSORS_ONLN"], [/* 23 vars */]) = 0
brk(0)                                  = 0x606000
...
open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
read(3, "0-31\n", 8192)                 = 5
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a922a0000
write(1, "32\n", 332

As you can see, getconf _NPROCESSORS_ONLN actually gets the number of cpus by reading the file /sys/devices/system/cpu/online.

By default on Kubernetes, the /sys/devices/system/cpu/online file is actually the host, so it is not surprising that the number of worker processes started by nginx is the same as the number of host cpus.

Solution

The solution is actually not hard to come up with, just modify /sys/devices/system/cpu/online in the container.

The community’s lxcfs has solved this problem.

lxcfs

LXCFS is a small FUSE filesystem designed to make a Linux container feel more like a virtual machine. LXCFS will focus on the key files in procfs.

1
2
3
4
5
6
7
/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime
/sys/devices/system/cpu/online

As you can see, the /sys/devices/system/cpu/online file that we need is also in the lxcfs concern list.

The usage of lxcfs is also relatively simple, just mount /var/lib/lxc/lxcfs/proc/online from the host to /sys/devices/system/cpu/online of the container.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
      containers:
      - args:
        - infinity
        command:
        - sleep
        volumeMounts:
        - mountPath: /sys/devices/system/cpu/online
          name: lxcfs-2
          readOnly: true
      volumes:
      - hostPath:
          path: /var/lib/lxc/lxcfs/proc/online
          type: ""
        name: lxcfs-2

When we read the /sys/devices/system/cpu/online file in the container, the read request is handed over to the lxcfs daemon to handle since the kubelet binds the file to /var/lib/lxc/lxcfs/proc/online.

The actual functions handled by lxcfs are as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
int max_cpu_count(const char *cg)
{
    __do_free char *cpuset = NULL;
    int rv, nprocs;
    int64_t cfs_quota, cfs_period;
    int nr_cpus_in_cpuset = 0;

    read_cpu_cfs_param(cg, "quota", &cfs_quota);
    read_cpu_cfs_param(cg, "period", &cfs_period);

    cpuset = get_cpuset(cg);
    if (cpuset)
        nr_cpus_in_cpuset = cpu_number_in_cpuset(cpuset);

    if (cfs_quota <= 0 || cfs_period <= 0){
        if (nr_cpus_in_cpuset > 0)
            return nr_cpus_in_cpuset;

        return 0;
    }

    rv = cfs_quota / cfs_period;

    /* In case quota/period does not yield a whole number, add one CPU for
        * the remainder.
        */
    if ((cfs_quota % cfs_period) > 0)
        rv += 1;

    nprocs = get_nprocs();
    if (rv > nprocs)
        rv = nprocs;

    /* use min value in cpu quota and cpuset */
    if (nr_cpus_in_cpuset > 0 && nr_cpus_in_cpuset < rv)
        rv = nr_cpus_in_cpuset;

    return rv;
}

Based on the previous information, you can see that the final value returned is 200000/100000 = 2.

Conclusion

Therefore, when lxcfs is available, nginx can safely configure worker_processes as auto without worrying about starting too many worker processes.