Recently, a colleague had a weird problem with his online server, where the error “fork:Unable to allocate memory” is always reported when executing any command. This problem is a recent one, and it was solved after the first few reboots, but it occurs every 2-3 days.
When you see this tip, your first reaction must be to suspect that there is really not enough memory. But check the memory occupation but found that it is not at all, memory is still free a lot! (Try a few more times to have a chance of executing successfully once.)
After some discussion, 3 ideas came to mind.
- Is it possible that under the numa architecture, the node is bound when the process is started, so that only the memory in one node works?
- Under numa architecture, if all memory is inserted into one slot, other nodes will run out of memory.
- Check what the number of incoming threads is now, and whether it exceeds the maximum limit.
After a period of troubleshooting, the cause was finally found, and the problem was solved successfully. Here I will report the conclusion directly to you, the previous guess about the numa memory shortage is wrong. The real reason is number 3 above, some java processes on this server created too many threads, which caused this error, not a real lack of memory.
I. The underlying process analysis
In this problem, Linux error prompts misleading place, resulting in people did not first think about the number of processes, so there is such a complex and tortuous process of troubleshooting.
So I want to go deep into the kernel to see how the error is actually prompted out, how to report such an inappropriate error prompt. Then, let’s also understand the process of creating the process.
The operating system of the online server in question is CentOS 7.8, and the corresponding kernel version is 3.10.0-1127.
1.1 Anatomy of do_fork
In the Linux kernel, both the creation of processes and threads are called to the core do_fork. Inside this function, the kernel data object needed for the new process (thread) is created by means of a copy.
The core of the entire process creation is located in copy_process, let’s look at its source code.
As you can see from the above code, the Linux kernel creates the entire process kernel object by calling different copy_xxx’s, including mm structs, including namespaces, etc.
Let’s focus on the paragraph related to alloc_pid. In this paragraph, the purpose is to request a pid object. If the application fails, an error is returned. Note the details of this code: whatever type of failure alloc_pid returns, its error type is written to return -ENOMEM. For your understanding, I’ll show this logic again separately.
The error type is set to -ENOMEM(retval = -ENOMEM) directly before the call to alloc_pid, and whenever alloc_pid returns incorrectly, it returns the ENOMEM error to the upper level. It doesn’t matter what the reason for the alloc_pid memory error is .
Let’s look at the definition of ENOMEM. It stands for Out of memory. (The kernel just returns the error code and the application layer gives the specific error, so the actual message is “unable to allocate memory”.)
I have to say. This error message from the kernel is too problematic. It causes a lot of confusion to the user.
1.2 Causes of alloc_pid failure
So let’s look at the cases where allocating a pid fails. Let’s look at the source code of alloc_pid.
What we usually call pid is not a simple integer type in the kernel, but a small structure (struct pid), as follows.
So you need to first request a piece of memory to store the small object. The first error case is that if the memory request fails, alloc_pid will return a failure. In this case it is indeed a memory problem and there is nothing wrong with the kernel returning ENOMEM after an error.
Moving on to the second case, alloc_pidmap is to request a process number for the current process, which is what we usually call a PID number. If the request fails, an error will be returned.
In this case, it’s just an error in allocating a process number, and it doesn’t have anything to do with running out of memory. But in this case the kernel causes an error of type ENOMEM (Out of memory) to be returned to the upper layer. This is quite unreasonable.
Here’s another extra lesson we learned! A process doesn’t just request one process number, it requests more than one through a for loop.
If the currently created process is a process in a container, then it has to request at least two PID numbers to be able to do so. One PID is the process number in the container namespace and one is the process number in the root namespace (the host).
This is in line with our usual experience. Every process in the container is actually visible to us in the host. But the process number you see in the container is generally different from the one you see on the host. For example, if the pid of a process in the container is 5, and in the host namespace it is 1256, then the object of the process in the kernel will look something like this.
Second, whether the new version has improved
Next, the first thing I thought of was that the kernel version we were using was too old. (I’m using kernel version 3.10.1 to keep up with the version of our online server.)
So I went back to the very new Linux 5.16.11 to see if the new version had fixed the inappropriate prompt.
A recommended tool: https://elixir.bootlin.com/ . You can view any version of the linux kernel source code on this site. It’s a great tool to use if you just want to look at it temporarily.
It seems to be working, retval is no longer written dead as ENOMEM, but is set according to the actual error of alloc_pid. Let’s see if alloc_pid is setting the error type correctly.
I was a little disappointed when I opened the alloc_pid source code and saw this big comment.
It means " ENOMEM is not the most obvious choice, especially for cases where pid creation fails. However, ENOMEM is something that we expose to userspace for a long time. Therefore, we can’t easily change it even if there is a more suitable error code".
This is not well addressed in the latest version either.
When creating a process in Linux, the error message returned when the pid is insufficient is “insufficient memory”. This inappropriate error prompt has caused a lot of confusion for many people.
Through today’s analysis, when we encounter this kind of insufficient memory error in the future, we should be more careful not to be fooled by the kernel and check if we have too many processes (threads) first.
As for how to solve this problem, you can increase the number of available pids by modifying the kernel parameters (/proc/sys/kernel/pid_max).
But I think the most fundamental method is to find out why there are so many processes (threads) in the system, and then kill it. The default number of 20,000 to 30,000 processes is already too large a number for most servers, and even this number is exceeded, which must be unreasonable.