I recently wrote a few simple scripts in Python to process some data, and since it’s just a simple function, I just use print to print the log.

The task occasionally throws some exceptions when running.

Because I have print logs in different places, the place where the error is reported is not quite the same every time, which leads to very strange results; sometimes this piece of code is not running, and the next time it may be another piece of code that is not triggered.

Although I noticed the key exception Broken pipe, I didn’t pay special attention to it, because there were some places in the code where http requests were sent, and I always thought it was a network IO problem, so I didn’t think about print, the most basic printing function 🤔.

It was only when this problem recurred that I took a good look at the exception and took a good look at print which is also an IO operation, is it really a problem with the print function that comes with it?

But in the local, test environment I ran countless times also failed to find anomalies; so I looked for operations and maintenance to get the way to run on line.

It turns out that in order to facilitate the maintenance of the script tasks submitted up by everyone, operations and maintenance themselves have to maintain a unified script, in which the script uses.

1
2
cmd = 'python /xxx/test.py'
os.popen(cmd)

to trigger the task, which is the only difference from my local, development environment.

popen Principle

For this reason I simulated the exception in the development environment:

test.py:

1
2
3
4
import time
if __name__ == '__main__':
    time.sleep(20)
    print '1000'*1024

task.py:

1
2
3
4
5
6
7
8
import os
import time
if __name__ == '__main__':
    start = int(time.time())
    cmd = 'python test.py'
    os.popen(cmd)
    end = int(time.time())
    print 'end****{}s'.format(end-start)

Run :

1
python task.py

Wait 20s and the exception will definitely recur.

1
2
3
4
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    print '1000'*1024
IOError: [Errno 32] Broken pipe

Why does this anomaly occur?

First, you need to understand how the function os.popen(command[, mode[, bufsize]] works.

according to the official documentation, this function executes fork a child process to execute the command command, while piping the standard output of the child process to the parent process.

This is also the file descriptor returned by the method.

Here’s a diagram to better understand the principle.

在这种情况下,没有得到popen()的返回值,所以command的执行本质上是异步的。

这意味着当task.py完成执行时,它会自动关闭管道的读取端。

The SIGPIPE signal is also mentioned there.

Solutions

Since we know the cause of the problem, it is relatively simple to solve, mainly the following programs.

  1. Use the read() function to read the data in the pipe, read it all, and then close it.
  2. The standard output of command can also be redirected to /dev/null if the output from the child process is not needed.
  3. It can also be run using the subprocess.Popen module of Python3.

Here is a demonstration using the first option

1
2
3
4
5
6
7
8
9
import os
import time
if __name__ == '__main__':
    start = int(time.time())
    cmd = 'python test.py'
    with os.popen(cmd) as p:
        print p.read()
    end = int(time.time())
    print 'end****{}s'.format(end-start)

After running task.py, no more exceptions are thrown, and the output of command is printed out.

I did not use this solution for the online fix, but used the standard logging framework to output the logs to es for easy viewing in kibana.

Since the logging framework does not use pipelines, there is no such problem.

Read more

Although the problem is solved, which still involves some of our usual less attention to the knowledge points, this time we will review together.

The first is the content of the parent-child process, which is more common in c/c++/python, and more common in Java/golang, which directly uses multithreading and co-processing.

For example, os.popen() in Python creates a child process, and since it is a child process, it must need to communicate with the parent process to achieve the purpose of working together.

It’s easy to think that the parent and child processes can communicate with each other through the pipes (anonymous pipes) mentioned above.

Using the Python program as an example, when task.py is run, two processes are created.

Go to the /proc/pid/fd directory of each of these two programs to see the file descriptors opened by these two processes.

Parent process.

Subprocesses.

You can see that the standard output of the child process is associated with the parent process, which is the file descriptor returned by popen().

Here 0, 1 and 2 correspond to stdin/stdout/stderr of a process, respectively.

It is also important to note that when we open a file descriptor in the parent process, the child process will inherit it as well.

For example, add a new code to task.py.

1
x = open("1.txt", "w")

Later, when you look at the file descriptor, you will see that both the parent and child processes will have this file.

But on the contrary, the parent process will not have the files opened in the child process, which should be easy to understand.

Summary

Some basics are especially important when troubleshooting some weird problems, such as the parent-child process pipeline communication involved in this case, and finally to summarize.

  1. The os.popen() function is executed asynchronously, if you need to get the output of the child process, you need to call the read() function by yourself.
  2. The parent and child processes communicate through anonymous pipes. When the read side is closed, the write side will receive a SIGPIPE signal when the output reaches the maximum cache of the pipe, thus throwing a Broken pipe exception.
  3. The child process will inherit the file descriptors of the parent process.

Reference https://crossoverjie.top/2021/05/12/cs/Linux%20Pipe/