Perf is a performance analysis tool that has been added to the kernel since Linux 2.6.31 (2009). It uses kernel runtime staking to detect the running of programs and can provide quite rich information, enough to find performance bottlenecks and optimisation priorities in the running of programs.
On some systems there may be a full Perf pre-installed, but on the Debian Buster (10) I am using, only the user space programs are pre-installed, so the corresponding kernel tools still need to be installed. Normally, the following command will install the kernel part of the corresponding kernel version of Perf. However, if the kernel being used is not the latest kernel in apt, you will need to reboot to switch to the new kernel or install the specified version of the Perf kernel tools.
At this point we should be able to run perf, but usually the following results will occur.
Or rather this
This is because, for security reasons, the kernel disables non-privileged users from monitoring system performance by default. There are several ways to resolve this.
- Modify the
- Add the CAP_PERFMON permission bit to perf
- Use the root user for performance monitoring
I generally use the first one myself, so I will describe the details of how to do the first one. It is not recommended to run perf with the root user, as this may be a security risk.
There are two ways to modify the
kernel.perf_event_paranoid parameter. The first method is temporary and expires after a reboot, but is a little quicker
Or use the
The second method is to modify the
sysctl configuration file, which in my environment is located at
/etc/sysctl.conf and on some systems is
Once this has been done, the kernel will allow unprivileged users to use the interfaces in the perf kernel section.
There are several common uses of perf, the first being a simpler rough idea of how the program is using the hardware
The second is also a powerful one, allowing a detailed analysis of the runtime percentage of each function of the program and also the time consumption percentage of each instruction in these functions, which allows us to get an accurate picture of the processor resource usage of the bottleneck of the program runtime.
At this point we can see a screen like this
Press A to see the disassembled assembly code and the corresponding time share.
Sometimes if the debug option is turned off at compile time (e.g. gcc -g) then it is likely that perf will not get the correct call stack, which will make the time share statistics problematic and will not allow us to observe hot functions. In this case you can specify that perf uses a more detailed call stack tracking method, for example using lbr
There are three methods of call stack tracing that perf can use.
- fp: least detailed, produces the smallest log file, and has little to no impact on program performance.
- lbr: more detailed, produces a significantly larger log file, and has a small impact on performance.
- dwarf: the most detailed, produces extremely large log files, maybe 10 gigabytes a minute, and has a significant impact on performance, so it is not very practical.
For me it’s usually straightforward to go with lbr, then consider fp if the hard drive isn’t big enough or if reading and writing a few gigabytes of files is slow.