Perf is a performance analysis tool that has been added to the kernel since Linux 2.6.31 (2009). It uses kernel runtime staking to detect the running of programs and can provide quite rich information, enough to find performance bottlenecks and optimisation priorities in the running of programs.

Installation

On some systems there may be a full Perf pre-installed, but on the Debian Buster (10) I am using, only the user space programs are pre-installed, so the corresponding kernel tools still need to be installed. Normally, the following command will install the kernel part of the corresponding kernel version of Perf. However, if the kernel being used is not the latest kernel in apt, you will need to reboot to switch to the new kernel or install the specified version of the Perf kernel tools.

1
2
sudo apt install linux-perf
sudo apt install linux-perf-5.8 # 指定内核版本

At this point we should be able to run perf, but usually the following results will occur.

1
2
3
4
5
6
7
8
$> perf record
Error:
You may not have permission to collect stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
 -1 - Not paranoid at all
  0 - Disallow raw tracepoint access for unpriv
  1 - Disallow cpu events for unpriv
  2 - Disallow kernel profiling for unpriv

Or rather this

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 3:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)

This is because, for security reasons, the kernel disables non-privileged users from monitoring system performance by default. There are several ways to resolve this.

  • Modify the kernel.perf_event_paranoid kernel parameter
  • Add the CAP_PERFMON permission bit to perf
  • Use the root user for performance monitoring

I generally use the first one myself, so I will describe the details of how to do the first one. It is not recommended to run perf with the root user, as this may be a security risk.

There are two ways to modify the kernel.perf_event_paranoid parameter. The first method is temporary and expires after a reboot, but is a little quicker

1
sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"

Or use the sysctl command

1
sudo sysctl -w kernel.perf_event_paranoid=-1

The second method is to modify the sysctl configuration file, which in my environment is located at /etc/sysctl.conf and on some systems is /etc/sysctl.d

1
2
sudo sh -c 'echo "kernel.perf_event_paranoid=-1" >> /etc/sysctl.conf'
sudo sysctl -p

Once this has been done, the kernel will allow unprivileged users to use the interfaces in the perf kernel section.

Usage

There are several common uses of perf, the first being a simpler rough idea of how the program is using the hardware

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$> perf stat ls

<stdout>

Performance counter stats for 'ls':

              1.87 msec task-clock                #    0.411 CPUs utilized
                11      context-switches          #    0.006 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                94      page-faults               #    0.050 M/sec
         2,771,746      cycles                    #    1.479 GHz
         1,534,926      instructions              #    0.55  insn per cycle
           317,631      branches                  #  169.519 M/sec
            15,169      branch-misses             #    4.78% of all branches

       0.004553499 seconds time elapsed

       0.003498000 seconds user
       0.000000000 seconds sys

The second is also a powerful one, allowing a detailed analysis of the runtime percentage of each function of the program and also the time consumption percentage of each instruction in these functions, which allows us to get an accurate picture of the processor resource usage of the bottleneck of the program runtime.

1
2
perf record <program>
perf report

At this point we can see a screen like this

sobyte

Press A to see the disassembled assembly code and the corresponding time share.

Sometimes if the debug option is turned off at compile time (e.g. gcc -g) then it is likely that perf will not get the correct call stack, which will make the time share statistics problematic and will not allow us to observe hot functions. In this case you can specify that perf uses a more detailed call stack tracking method, for example using lbr

1
perf record --call-graph lbr <program>

There are three methods of call stack tracing that perf can use.

  • fp: least detailed, produces the smallest log file, and has little to no impact on program performance.
  • lbr: more detailed, produces a significantly larger log file, and has a small impact on performance.
  • dwarf: the most detailed, produces extremely large log files, maybe 10 gigabytes a minute, and has a significant impact on performance, so it is not very practical.

For me it’s usually straightforward to go with lbr, then consider fp if the hard drive isn’t big enough or if reading and writing a few gigabytes of files is slow.