In Go, it is easy to write Benchmark functions to test the performance of a function point. For important functions, we can add the corresponding test process in CI/CD to be aware of the function performance changes when they occur. So the question is, how do you detect performance changes in a function?

To put it another way, you write a function and find that it is slow and you need to optimize it, and when you Google it and find a better implementation, you find that it is indeed faster through the Benchmark function. But you can’t say exactly how much faster, and you want to know the performance comparison before and after the function optimization, and how many percentage points of improvement with high confidence?

For the above requirement scenario, there is a tool that can help you, and it is benchstat.

Benchmark Example

Let’s review the benchmark first. For ease of understanding, here is an example of computing the classical calculation of Fibonacci series values.

1
2
3
4
5
6
7
func FibSolution(n int) int {
 if n < 2 {
  return n
 }

 return FibSolution(n-1) + FibSolution(n-2)
}

The above code is a recursive implementation, and it is clear that the function becomes very time consuming as n gets larger and larger. For example, if n is 20, the Benchmark function looks like this.

1
2
3
4
5
func BenchmarkFib20(b *testing.B) {
 for i := 0; i < b.N; i++ {
  FibSolution(20)
 }
}

Execute go test -bench=BenchmarkFib20 on the command line to get the performance results.

1
BenchmarkFib20-8           39452             30229 ns/op

Where -8 represents 8 cpu, the number of function runs is 39452, and the average time spent per function is 30229ns.

If we want to get multiple samples, we can specify the -count=N parameter of go test. For example, if we want to get the sample data 5 times, we would run go test -bench=BenchmarkFib20 -count=5.

1
2
3
4
5
BenchmarkFib20-8           39325             30297 ns/op
BenchmarkFib20-8           39216             30349 ns/op
BenchmarkFib20-8           39901             30251 ns/op
BenchmarkFib20-8           39336             30455 ns/op
BenchmarkFib20-8           39423             30894 ns/op

The iterative implementation to calculate the Fibonacci series values is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func FibSolution(n int) int {
 if n < 2 {
  return n
 }
 p, q, r := 0, 0, 1
 for i := 2; i <= n; i++ {
  p = q
  q = r
  r = p + q
 }
 return r
}

The plainest way to compare the performance difference between these two functions is to benchmark them separately and then analyze these benchmark results by hand, but this is not intuitive.

benchstat

benchstat is an official Go recommended command line tool for calculating and comparing statistics related to benchmarking.

We can install it with the following command.

1
go install golang.org/x/perf/cmd/benchstat@latest

Execute the -h parameter to see a description of the tool’s usage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
~ $ benchstat -h
usage: benchstat [options] old.txt [new.txt] [more.txt ...]
options:
  -alpha α
     consider change significant if p < α (default 0.05)
  -csv
     print results in CSV form
  -delta-test test
     significance test to apply to delta: utest, ttest, or none (default "utest")
  -geomean
     print the geometric mean of each file
  -html
     print results as an HTML table
  -norange
     suppress range columns (CSV only)
  -sort order
     sort by order: [-]delta, [-]name, none (default "none")
  -split labels
     split benchmarks by labels (default "pkg,goos,goarch")

We would like to compare the performance benchmarks of FibSolution(n) from 15 to 20, for both implementations.

1
2
$ go test -bench=. -count=5 | tee old.txt
$ go test -bench=. -count=5 | tee new.txt

Note that the two commands are executed with recursive and iterative implementation logic corresponding to the FibSolution function, respectively.

At this point, we can compare the performance of these two function implementation logics.

1
2
3
4
5
6
7
8
 $ benchstat old.txt new.txt 
name     old time/op  new time/op  delta
Fib15-8  2.67µs ± 2%  0.01µs ± 5%  -99.81%  (p=0.008 n=5+5)
Fib16-8  4.20µs ± 1%  0.01µs ± 2%  -99.87%  (p=0.008 n=5+5)
Fib17-8  6.81µs ± 0%  0.01µs ± 2%  -99.92%  (p=0.008 n=5+5)
Fib18-8  11.1µs ± 1%   0.0µs ± 1%  -99.95%  (p=0.008 n=5+5)
Fib19-8  18.0µs ± 2%   0.0µs ± 4%  -99.97%  (p=0.008 n=5+5)
Fib20-8  29.2µs ± 1%   0.0µs ± 3%  -99.98%  (p=0.008 n=5+5)

As you can see, the execution time of the recursive implementation of the function increases significantly as the value of n becomes larger. The iterative implementation reduces the average time overhead by more than 99% compared to the recursive implementation, which is a very significant optimization effect.

In addition, p=0.008 indicates the confidence level of the results, and a larger p value indicates a lower confidence level. In general, 0.05 is used as the threshold value, beyond which the results are not credible. n=5+5 indicates the number of valid samples used respectively.

Summary

benchstat is a benchmarking statistics tool that can be used to alleviate the cost of manually analyzing data when we do some optimization work.

If you have a project that deploys automated tests in your CI/CD process, you may want to add this tool to the mix. It may help you identify problems in advance when there are changes to functions that add to performance loss.