Studying the performance of a program is important for every engineer, and I can even say this: It is a necessary skill for an engineer.

Golang’s performance analysis tool, pprof, supports the following kinds of analysis.

• heap: a sampling of all living objects in current memory (almost a must for languages with GC), which can be used to analyze memory problems.
• profile: as opposed to heap, a sampling of the CPU, which can be used to analyze the time-consuming bottleneck of the program.
• goroutine: a stack trace of all current goroutines.
• allocs: all past memory requests sampled.
• block: stack trace for blocking on synchronous primitives.
• mutex: mutex stack trace of all race relations.
• trace: trace of current program execution.

The debugging experience with good tools is also great, and the whole process can be analyzed in a few simple commands. However, due to the limitation of space and the fact that CPU analysis has been mentioned many times before, today we will only talk about how to analyze memory, i.e. Heap.

Heap is generally used for memory leaks, or if you want to optimize the use of memory.

## Memory leaks and memory optimization

For memory leaks, these are often difficult to find and analyze because they require monitoring the Go program itself or looking at the OOM logs in Linux’s dmesg to find them.

 1  dmesg | grep oom-killer 

When you find an OOM log, you should consider adding monitoring that you have ignored, because this problem will be reproduced (but often difficult to reproduce on your own machine and in pre-release environments). If you don’t know what the monitoring parameters are, you can look at the monitoring data and simply set a percentage, for example, when your program initializes it takes up 10% of the memory, then once the Go program’s memory usage reaches a certain percentage, such as 50% of the machine’s memory, you will be alerted immediately and you will be able to analyze it.

However, there is no need to go through a lot of trouble because you can add pprof support to your Go program with a few simple lines of code that will not affect the program’s operation and provide web services so that you can analyze it through your browser.

  1 2 3 4 5 6 7 8 9 10  import ( "net/http" _ "net/http/pprof" ) func main() { go func() { http.ListenAndServe("localhost:8080", nil) }() } 

Then, using the pprof tool provided by go, you can analyze, for example, for memory leaks.

 1  go tool pprof http://localhost:8080/debug/pprof/heap 

It will take you to pprof’s REPL, where you can locate the problem with some simple commands. However, for a better analysis experience, there are two things to note.

1. if your compile parameters are reloaded with -trimpath and -ldflag "-s -w", it is better to remove them, otherwise it will affect your ability to locate the problem.
2. execute this command on the compiling machine so that it can be analyzed directly down to the level of each line of code.

The next actual example I use belongs to memory usage analysis optimization, and since I haven’t encountered OOM yet, I’ll start with a small example I encountered instead, since the two problems are analyzed in the same way.

## How to use pprof

First step, first look at top10.

 1 2 3 4 5 6 7 8 9  (pprof) top10 Showing nodes accounting for 3759.91kB, 100% of 3759.91kB total Showing top 5 nodes out of 24 flat flat% sum% cum cum% 2345.25kB 62.38% 62.38% 2345.25kB 62.38% io.ReadAll 902.59kB 24.01% 86.38% 902.59kB 24.01% compress/flate.NewWriter 0 0% 100% 902.59kB 24.01% bufio.(*Writer).Flush 0 0% 100% 902.59kB 24.01% compress/gzip.(*Writer).Write (hereinafter omitted)... 

Here’s a hint: flat means that the rightmost call is still free, and cum means the cumulative (cumulative) space requested. top is sorted by flat by default, and you can switch the sorting method with the argument: top10 -cum.

If you don’t see anything unusual here, there’s something else to look at, because the sampling statistics of the Golang heap are divided into four parts.

• alloc_objects : requested objects
• alloc_space: requested space
• inuse_objects: objects in use
• inuse_space: space in use (default)

You can switch between them with a command like sample_index=inuse_objects.

In my case, since I am sure here why the first item io.ReadAll is in my program, but the second item compress/flate.NewWriter makes me think there is an exception, but I don’t know where it is called. So, after identifying the exception item, the second step can be to further confirm the call chain by tree.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46  (pprof) tree 10 compress Active filters: focus=compress Showing nodes accounting for 2354.01kB, 29.36% of 8018.09kB total Showing top 10 nodes out of 11 ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 2354.01kB 100% | compress/gzip.(*Writer).Write 1805.17kB 22.51% 22.51% 2354.01kB 29.36% | compress/flate.NewWriter 548.84kB 23.32% | compress/flate.(*compressor).init ----------------------------------------------------------+------------- 548.84kB 100% | compress/flate.(*compressor).init (inline) 548.84kB 6.85% 29.36% 548.84kB 6.85% | compress/flate.(*compressor).initDeflate ----------------------------------------------------------+------------- 2354.01kB 100% | github.com/prometheus/common/expfmt.MetricFamilyToText.func1 0 0% 29.36% 2354.01kB 29.36% | bufio.(*Writer).Flush 2354.01kB 100% | compress/gzip.(*Writer).Write ----------------------------------------------------------+------------- 548.84kB 100% | compress/flate.NewWriter 0 0% 29.36% 548.84kB 6.85% | compress/flate.(*compressor).init 548.84kB 100% | compress/flate.(*compressor).initDeflate (inline) ----------------------------------------------------------+------------- 2354.01kB 100% | bufio.(*Writer).Flush 0 0% 29.36% 2354.01kB 29.36% | compress/gzip.(*Writer).Write 2354.01kB 100% | compress/flate.NewWriter ----------------------------------------------------------+------------- 2354.01kB 100% | github.com/prometheus/common/expfmt.NewEncoder.func7 0 0% 29.36% 2354.01kB 29.36% | github.com/prometheus/common/expfmt.MetricFamilyToText 2354.01kB 100% | github.com/prometheus/common/expfmt.MetricFamilyToText.func1 ----------------------------------------------------------+------------- 2354.01kB 100% | github.com/prometheus/common/expfmt.MetricFamilyToText 0 0% 29.36% 2354.01kB 29.36% | github.com/prometheus/common/expfmt.MetricFamilyToText.func1 2354.01kB 100% | bufio.(*Writer).Flush ----------------------------------------------------------+------------- 2354.01kB 100% | github.com/prometheus/common/expfmt.encoderCloser.Encode 0 0% 29.36% 2354.01kB 29.36% | github.com/prometheus/common/expfmt.NewEncoder.func7 2354.01kB 100% | github.com/prometheus/common/expfmt.MetricFamilyToText ----------------------------------------------------------+------------- 2354.01kB 100% | xizhibei-app/controllers/internal_rpc.(*SystemCtrl).GetMetrics 0 0% 29.36% 2354.01kB 29.36% | github.com/prometheus/common/expfmt.encoderCloser.Encode 2354.01kB 100% | github.com/prometheus/common/expfmt.NewEncoder.func7 ----------------------------------------------------------+------------- 0 0% 29.36% 2354.01kB 29.36% | xizhibei-app/controllers/internal_rpc.(*SystemCtrl).GetMetrics 2354.01kB 100% | github.com/prometheus/common/expfmt.encoderCloser.Encode ----------------------------------------------------------+------------- 

Now, we can basically confirm that there was a small problem in my implementation of GetMetrics when handling the serialization compression of the prometheus client (but not to the point of a memory leak). Alternatively, here you can add a Step 3: use the list plus keyword command to see the analysis down to the level of each line of code.

After locating the problem, it is the last step to solve it, and my solution is to use sync.Pool. Previously, I was using gzip.NewWriter directly to compress the metric text each time it was taken out of prometheus, but this caused gzip to duplicate memory requests and initialization multiple times, so I modified the code to use sync.Pool instead.

The original code.

 1 2  buf := new(bytes.Buffer) gzipWritter := gzip.NewWriter(buf) 

The Modified code.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  var ( gzipPool = sync.Pool{ New: func() interface{} { return gzip.NewWriter(nil) }, } bufferPool = sync.Pool{ New: func() interface{} { return new(bytes.Buffer) }, } ) ... gzipWritter := gzipPool.Get().(*gzip.Writer) defer gzipPool.Put(gzipWritter) buf := bufferPool.Get().(*bytes.Buffer) defer bufferPool.Put(buf) buf.Reset() gzipWritter.Reset(buf) 

We can write a benchmark to test.

 1 2 3 4 5  goos: linux goarch: amd64 cpu: Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz BenchmarkEncode-20 2422 504022 ns/op 851822 B/op 129 allocs/op BenchmarkEncodeWithSyncPool-20 7654 150188 ns/op 48799 B/op 108 allocs/op 

You can see that the allocs of memory has dropped from 129 to 108.

Well, that’s it for now.

For most people, it is simpler to analyze the problem with a mouse click on the web page, as the tool Go pprof currently does a one-stop shop where you can see the call chart as well as the flame chart directly on the web page. For more details, see the official documentation.