Two nights ago, the online system suddenly broke down, and after immediately opening the online error log, I only got a bunch of meaningless program call stack (traceback) output, so the team members were caught in a long and blind problem solving process. The problem was fortunately solved, but I couldn’t figure out why the call stack printed in the log was meaningless when, as a rule of thumb, it should print the call stack during the exception generation process. After subsequent source code analysis and troubleshooting, I realized it was actually due to an old piece of code in the project using monkey patch, which is what this article is trying to discuss.
What is a monkey patch
Monkey patching is a double-edged sword
Monkey patches are flexible enough to allow complete separation of the patch code from the application code, while allowing the application code to be invoked in a way that remains constant. From the perspective of the application code, it calls the original defined method or function of a module; from the perspective of the called method or function, the presence of the monkey patch is transparent to it, as shown in the following demo in Python.
Let’s start with a minimalist example and say hello to a wonderful world.
If the above script is executed, the result will be
This is easy. Next, let’s say we do a monkey patch: we expand the original
greet behavior and now print the current time in addition to the message.
Run it and get the following results.
We get the expected result!
From the code analysis, we added a new function
greet_with_time, which calls the original
greet function, then prints the current time, and finally replaces the
greet function by assigning the function to a variable. The final call to the
greet function does not require any changes, thus achieving the same call to the
greet function but with a very different behavior.
The above demo is just a simplified code for the sake of space, the monkey patch code in a real project is always in a separate module or file. Imagine a large, complex project where your code is flooded with monkey patches, it would be a disastrous challenge to analyze the behavior of the system and troubleshoot problems.
Now that we have some understanding of monkey patches, let’s look at an example I encountered in a real project.
A bunch of meaningless stack information
I reproduced locally the exception we encountered as I mentioned at the beginning, and here are the stack messages consistent with the online environment.
From this stack message, it actually prints the stack of the
Logger.log function called, in which there is no code at all to see the word
.insert, which has nothing to do with
AttributeError("'long' object has no attribute 'insert'",). Such a stack of information is basically the same with and without. So, I went on to explore more through the editor and the source code.
The first thing I did was to use the stack above to analyze what went wrong, so I looked at the code at
logger.log method is called here (the source of the logger can be analyzed in Celery’s code, but is not the focus of this article, so I won’t expand on it) and two important messages are passed in via the
traceback. Further reading of the
logger.log source code confirmed that the core of this piece of log printing relies on a call to the
So, I went back to the code at
celery/worker/job.py:504-511 and inserted two codes that print error stack information before
After restarting celery and executing the asynchronous task, the first error stack is identical to the one I posted earlier, which is understandable since the call to the
print_exception function here is the core implementation in
logger.log. The call to
format_exception gives me the real meaningful error stack information.
Now it’s clear, this is where the exception in this code is really coming from!
But then the question arises, why do
print_exce ption and
format_exception give different stack information? I went to the official documentation with a lot of questions, but the confusion got worse.
traceback.format_exception(etype, value, tb[, limit]) Format a stack trace and the exception information. The arguments have the same meaning as the corresponding arguments to print_exception(). The return value is a list of strings, each ending in a newline and some containing internal newlines. When these lines are concatenated and printed, exactly the same text is printed as does print_exception().
Focus on the last sentence, the official Python documentation says that both functions output the same error stack (EXACTLY the same text)!
Spot the monkey patch
Actually, the real process of troubleshooting the problem took me a lot of time. I hadn’t been thinking about the monkey patch, but I finally found the culprit on the subway on my way out to a friend’s dinner date, when I looked at the project code on my company’s GitLab on my phone.
From the patch code, the patch directly overwrites the original code, and the implementation also directly and brutally ignores the several exception message parameters passed in! That’s why it’s such a big oops, with unrelated exception stack information! (╯‵□′)╯︵┻┻
Tips for troubleshooting monkey patches
There are certainly pros and cons to this type of programming technique, monkey patches, and using them necessarily requires extra caution. However, it is not necessary to stay away from them, the focus is on mastering the necessary troubleshooting skills, and I will go back to this lesson to find some methods that may help.
1. Check the method or function information through the function or method itself properties
As you know, all Python objects have a bunch of built-in properties, and functions are no exception.
A quick look shows that the real code of this function is actually the patch code of the project!
2. Inspecting with the inspect package
Python provides a lot of toolkits, and inspect is naturally one of them, and it can be used to do runtime checks on almost any type.
In short, if you encounter code behavior does not match the expected but can not correspond with the official documentation or official source code, then it is possible that the dependent method or function was monkey patched, and the quickest way to confirm the monkey patch is to check the actual definition of the function or method called in the first place, that is, to apply the above method!
I’ve encountered the monkey patch trap when doing Ruby development, and there is a similar approach in Ruby.