Problem phenomenon

Our kubernetes ingress controller is using ingress-nginx from kubernetes and we recently encountered a " Too many open files" problem.

1
2
3
2019/09/19 09:47:56 [warn] 26281#26281: *97238945 a client request body is buffered to a temporary file /var/lib/nginx/body/0000269456, client: 1.1.1.1, server: xxx.ieevee.com, request: "POST /api/v1/xxx HTTP/1.1", host: "xxx.ieevee.com"
2019/09/19 09:47:56 [crit] 26281#26281: accept4() failed (24: Too many open files)
2019/09/19 09:47:56 [crit] 26281#26281: *97238948 open() "/var/lib/nginx/body/0000269457" failed (24: Too many open files), client: 1.1.1.1, server: xxx.ieevee.com, request: "POST /api/v1/xxx HTTP/1.1", host: "xxx.ieevee.com"

Preliminary analysis

Look at the prompt message, it is nginx opening too many files (nginx caching user requests). nginx establishes new connections, which are handled by the nginx worker, so let’s see how many files are opened by the nginx worker.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# ps aux|grep worker
...
nobody   26281  0.0  0.0 422196 71596 ?        Sl   13:05   0:00 nginx: worker process
nobody   26282  0.0  0.0 422196 71664 ?        Sl   13:05   0:00 nginx: worker process
...
# cat /proc/26281/limits |grep open
Max open files            1024                1024                files
# lsof -p 26281|wc -l
910
# ulimit -n
65535

As you can see, the nginx worker can open a maximum of 1024 files, but the current process caught has opened 910, in the traffic is larger, there may be a larger number of open files problem. Also note the ulimit value, which will be mentioned later.

linux basics: fs.file-max vs ulimit

fs.file-max

First, there is a global range of the number of files that can be opened on linux.

1
2
# cat /proc/sys/fs/file-max
13179954

Attention.

  • This value is related to OS, hardware resources, and may be different for different systems. For example, the value above is on a physical server, but on a virtual machine it is only 808539; the value may also be different for the same hardware and different os
  • This value expresses the max value of open files at the system level, and has nothing to do with a user or a session
  • This value is changeable! If you run some database or web server, the default file-max is likely to be insufficient due to the need to open a large number of files, which can then be modified by sysctl.
1
# sysctl -w fs.file-max=500000

Or modify sysctl.conf to add fs.file-max=500000 and sysctl -p to take effect.

fs.file-nr

sys fs also has a sibling value file-nr, which indicates the number of files that have been opened. As follows, it means that 55296 files have been opened, and as long as this value is below file-max, the system can still open new files.

1
2
# cat /proc/sys/fs/file-nr
55296   0   13179954

ulimit

What about ulimit, is it system level? No, this is a classic misconception. In fact, ulimit limits the resources that a process can use.

If it’s on the host, you can modify ulimit by modifying /etc/security/limits.conf.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
//未设置时,可打开文件soft限制是1024。
$ ulimit -Sn
1024
$ ulimit -Hn
1048576
//修改limits.conf
$ cat /etc/security/limits.conf
bottle           soft    nofile          10000
bottle           hard    nofile          100000
//ssh重新登录
$ ulimit -Sn
10000
$ ulimit -Hn
100000

In-depth analysis

Back to the original question.

Our nginx ingress controller is running in a container, so the number of files that the nginx worker can open depends on the ulimit in the container.

As we can see from the previous logs, the maximum number of files that can be opened by the ulimit in the container is 65535, the nginx worker is a multi-threaded model, and the number of threads depends on the number of cpu cores, so the number of files that can be opened by each nginx worker is 65535/(number of cpu cores). The maximum number of files that each nginx worker can open is 65535/56 = 1170.

So where does the maximum number of open files per nginx worker of 1024 come from?

Let’s look at how it is calculated in ingress-nginx.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    // the limit of open files is per worker process
    // and we leave some room to avoid consuming all the FDs available
    wp, err := strconv.Atoi(cfg.WorkerProcesses)
    glog.V(3).Infof("number of worker processes: %v", wp)
    if err != nil {
        wp = 1
    }
    maxOpenFiles := (sysctlFSFileMax() / wp) - 1024
    glog.V(2).Infof("maximum number of open file descriptors : %v", maxOpenFiles)
    if maxOpenFiles < 1024 {
        // this means the value of RLIMIT_NOFILE is too low.
        maxOpenFiles = 1024
    }

This means that 1024 will be deducted for the thread itself (after all, it has to open some .so .a etc. files) and the rest will be used to process web requests; if it is less than 1024, it will be rounded up to 1024, so we saw earlier that the maximum number of open files per nginx worker is 1024.

Solution

Now the problem is clear, it is the nginx worker can open the maximum number of files, how to solve it?

Can #### ulimit do it?

The first solution that comes to mind is to modify ulimit.

However, since nginx for the ingress controller is running in a container, we need to modify the limit in the container.

docker is also essentially a process, if you want to set ulimit, take docker run as an example, you can set the -ulimit parameter in the format =[:], the maximum number of open files we want to set, and the type corresponds to nofile.

1
2
3
4
5
$ docker run -it --ulimit nofile=1024:65535 ubuntu bash
root@917bd8850581:/# ulimit -Sn
1024
root@917bd8850581:/# ulimit -Hn
65535

As mentioned before, ulimit represents the limits of the “process”, what about the ulimit of the child processes of the process? We’ll go back to bash to see if the container is the same as before.

1
2
3
4
5
root@917bd8850581:/# bash
root@917bd8850581:/# ulimit -Sn
1024
root@917bd8850581:/# ulimit -Hn
65535

As you can see, the ulimit of the child process (the new bash) and the ulimit of the parent process (i.e. the docker run up bash), are the same.

Therefore, we can find a way to pass this parameter to the nginx worker, so that we can control the maximum number of open files for the nginx worker.

However, kubernetes does not support user-defined ulimit.issue 3595 was proposed by thockin when docker first introduced ulimit settings, but years have passed and there is no CLOSE, and the community has a different voice about this option.

This method does not work.

Changing the ulimit default for docker daemon

The ulimit value of processes in containers is inherited from docker daemon, so we can modify the configuration of docker daemon to set the default ulimit so that we can control the ulimit value of containers running on the ingress machine (including the ingress container itself). Modify /etc/docker/daemon.json.

1
2
3
4
5
6
7
    "default-ulimits": {
        "nofile": {
            "Name": "nofile",
            "Hard": 64000,
            "Soft": 64000
        }
    },

This way can be passed, but it is slightly tricky, so don’t use it for now.

/etc/security/limits.conf

Since you can modify ulimit on the host through /etc/security/limits.conf, can’t you do the same in the container?

Let’s make a new image and overwrite the /etc/security/limits.conf in the original image.

1
2
FROM ubuntu
COPY limits.conf /etc/security/limits.conf

The contents of limits.conf are as follows.

1
2
root           soft    nofile          10000
root           hard    nofile          100000

Make a new image, named xxx, without setting --ulimit After booting, view.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ docker run -it --rm ubuntu bash
root@fee1ea85ca56:/# ulimit -Sn
1048576
root@fee1ea85ca56:/# ulimit -Hn
1048576
root@fee1ea85ca56:/# exit
$ docker run -it --rm xxx bash
root@d535db9287b0:/# ulimit -Sn
1048576
root@d535db9287b0:/# ulimit -Hn
1048576

Unfortunately, docker doesn’t use the /etc/security/limits.conf file, so it doesn’t work either.

Modifying the calculation

This issue was actually encountered by someone in 2018 and submitted PR 2050.

The author’s idea is that since I can’t set ulimit, I can solve the problem by setting fs.file-max in the container (e.g. by adding init Container) and modifying the way ingress nginx is calculated, wouldn’t that solve the problem?

Set in init Container.

1
sysctl -w fs.file-max=xxx

The change in ingress nginx is also very simple, the previous calculation is not modified, just the implementation of sysctlFSFileMax is changed from getting ulimit to fs/file-max (obviously, the original comment was wrong, because it was actually ulimit, process-level, not fs.file-max).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// sysctlFSFileMax returns the value of fs.file-max, i.e.
// maximum number of open file descriptors
func sysctlFSFileMax() int {
    fileMax, err := sysctl.New().GetSysctl("fs/file-max")
    if err != nil {
        glog.Errorf("unexpected error reading system maximum number of open file descriptors (fs.file-max): %v", err)
        // returning 0 means don't render the value
        return 0
    }
    glog.V(3).Infof("system fs.file-max=%v", fileMax)
    return fileMax
}

Even though the ingress is in a container, since we have a whole physical machine that is used as the ingress, we don’t need the init Container either, and it works just fine as calculated above.

The way is clear!

Modify the fallback

However, this PR predates the version of ingress nginx I use, and in my version, ulimit is still used (of course the comments are still wrong).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// sysctlFSFileMax returns the value of fs.file-max, i.e.
// maximum number of open file descriptors
func sysctlFSFileMax() int {
    var rLimit syscall.Rlimit
    err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit)
    if err != nil {
        glog.Errorf("unexpected error reading system maximum number of open file descriptors (RLIMIT_NOFILE): %v", err)
        // returning 0 means don't render the value
        return 0
    }
    glog.V(2).Infof("rlimit.max=%v", rLimit.Max)
    return int(rLimit.Max)
}

What happened?

It turns out that three months later, a user submitted a new issue complaining that PR 2050 is a special case where one cannot assume that ingress nginx is able to use all the resources of the container host, and submitted a new PR that backed out of the code.

It should be said that it is reasonable to use ulimit, because indeed nginx is in a container and it is not reasonable to allocate all host resources to the container.

However, docker’s isolation is not good, the default nginx ingress controller calculates nginx workers by taking the runtime.NumCPU(), so that even if the nginx container is assigned a 4-core CPU, the number of workers obtained is still the host!

But you can’t say that the current calculation is wrong and shouldn’t be divided by the number of CPUs, because ulimit is indeed “process” level, and nginx workers are indeed threads.

OK, according to the above understanding, as long as you can set the ulimit of nginx container, everything will be fine, however, kubernetes does not support it.

Not playing with you: configuration items

As you can see from the previous analysis, ingress nginx’s calculation of the maximum number of open files for nginx workers is a bit messy, so a user later submitted a new PR that skips the guess process and use it directly as a configuration item.

It finally cleared up.