In Go language, how to handle string summing efficiently? Since strings are immutable, stitching many strings together is like declaring a new variable to store. Here we can use strings.Builder or bytes.Buffer to solve the string summing performance problem. In addition to performance issues, it is important to note that bytes.Buffer handles the conversion between []byte and string. Here are some of the errors written in the actual project for your reference.

Buffer reuse problem

The result of parsing data with the bytes.Buffer suite. The following is a basic example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package main

import (
  "bytes"
  "fmt"
)

var buf bytes.Buffer

func parseMultipleValue(n int, str string) []byte {
  buf.Reset()
  for i := 0; i < n; i++ {
    buf.WriteString(str)
  }
  return buf.Bytes()
}

func main() {
  s1 := parseMultipleValue(5, "1")
  fmt.Println("s1:", string(s1))
  s2 := parseMultipleValue(3, "2")
  fmt.Println("s1:", string(s1))
  fmt.Println("s2:", string(s2))
}

Please directly open the example online to see, the result after execution is as follows.

1
2
3
s1: 11111
s1: 22211
s2: 222

Have you seen that if you want to access the result of s1 for the second time, you will find that the latter s2 data will cover part of the s1 data. The reason for this is that when the first time s1 gets a memory with 5 bits of space, and when the second time parseMultipleValue is executed, the bytes.Rest() just moves the offset position to the 0 position, and writes the new content to the front of the same memory position. The first 3 characters of the content of s1 are changed to the new s2 string.

Two solutions

How can I do this without affecting the contents of s1? The problem can be solved by using the built-in bytes.Buffer function String() directly.

1
2
3
4
5
6
7
8
9
var buf bytes.Buffer

func parseMultipleValue(n int, str string) string {
  buf.Reset()
  for i := 0; i < n; i++ {
    buf.WriteString(str)
  }
  return buf.String()
}

If you don’t use String(), you can also use the copy method, and use unsafe.Pointer to optimize the performance of byte to string.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
var buf bytes.Buffer

func b2s(b []byte) string {
  return *(*string)(unsafe.Pointer(&b))
}

func parseMultipleValue(n int, str string) string {
  buf.Reset()
  for i := 0; i < n; i++ {
    buf.WriteString(str)
  }
  s := make([]byte, len(buf.Bytes()))
  copy(s, buf.Bytes())
  return b2s(s)
}

Both of the above solutions can eventually solve the problem, and there is no difference in performance, so you can choose one of them.

1
2
3
4
BenchmarkA
BenchmarkA-8                       34922             33986 ns/op          106496 B/op          1 allocs/op
BenchmarkB
BenchmarkB-8                       35760             33714 ns/op          106496 B/op          1 allocs/op

Complete code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
package main

import (
  "bytes"
  "math/rand"
  "testing"
  "unsafe"
)

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func randomString(n int) string {
  b := make([]byte, n)
  for i := range b {
    b[i] = letterBytes[rand.Intn(len(letterBytes))]
  }
  return string(b)
}

var buf bytes.Buffer

func b2s(b []byte) string {
  return *(*string)(unsafe.Pointer(&b))
}

func parseMultipleValue(n int, str string) string {
  buf.Reset()
  for i := 0; i < n; i++ {
    buf.WriteString(str)
  }
  s := make([]byte, len(buf.Bytes()))
  copy(s, buf.Bytes())
  return b2s(s)
}

func parseMultipleValue2(n int, str string) string {
  buf.Reset()
  for i := 0; i < n; i++ {
    buf.WriteString(str)
  }

  return buf.String()
}

func benchmark(b *testing.B, f func(int, string) string) {
  str := randomString(10)
  b.ReportAllocs()
  for i := 0; i < b.N; i++ {
    f(10000, str)
  }
}

func BenchmarkA(b *testing.B) { benchmark(b, parseMultipleValue) }
func BenchmarkB(b *testing.B) { benchmark(b, parseMultipleValue2) }

Since I was in a hurry to analyze the content of a very large file (200MB), I did not write a complete test, so I did not find this error. It is true that my own momentary negligence caused this mistake, after making up the complete test, I can optimize the performance one after another.