Go1.18 will be released in a few weeks (March), and we have already updated several new version features, today we bring you a new optimization class that is related to strings and bytes standard library.

Background

Want to copy faster

In everyday programming, bytes []byte are often copied. The following code needs to be written.

1
2
dup := make([]byte, len(data))
copy(dup, data)

@Ilia Choly thinks this would be too much trouble, after all, to write it every time, or to wrap it yourself as a function like this.

1
2
3
4
5
6
// Clone returns a copy of b
func Clone(b []byte) []byte {
  b2 := make([]byte, len(b))
  copy(b2, b)
  return b2
}

To this end, a shortcut method was added, but this is obviously untenable, and those familiar with the syntax will find that there is a ready-made way.

1
b2 := append([]byte(nil), b...)

The effect is achieved in one line, or even faster, because the allocated slice will not be initialized to a zero value.

Copying will share memory

Many Go developers, when writing applications to copy slices, find that the copied slice s1 is memory-linked to the original s0, essentially because of the underlying data structure, which can lead to many hidden problems.

The sample code is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import (
 "fmt"
 "reflect"
 "unsafe"
)

func main() {
 s0 := "脑子进煎鱼了"
 s1 := s0[:3]
 s0h := (*reflect.StringHeader)(unsafe.Pointer(&s0))
 s1h := (*reflect.StringHeader)(unsafe.Pointer(&s1))

 fmt.Printf("Len is equal: %t\n", s0h.Len == s1h.Len)
 fmt.Printf("Data is equa: %t\n", s0h.Data == s1h.Data)
}

From the above program, do you think that the variables s0 and s1 are equal in terms of Len and Data?

The output is as follows.

1
2
Len is equal: false
Data is equa: true

Len is not equal, after all, it is copied by index. But Data is equal. Why? Is there a Go bug?

This is actually related to the underlying data structure of Go in String and Slice, for example, the runtime representation of String is StringHeader.

His underlying structure is as follows.

1
2
3
4
type StringHeader struct {
 Data uintptr
 Len  int
}
  • Data: points to the concrete underlying array.
  • Len: represents the length of the string slice.

The key is that Data, which is essentially a pointer address to the underlying data, is therefore copied over as well when copying. This causes unnecessary copying and “dirty” data, triggering all sorts of weird bugs.

This is the pain point of the new feature.

New Features

In the new feature of Go1.18, a Clone method is added to strings and bytes to solve the 2 problem points mentioned above.

The source code is as follows.

1
2
3
4
5
6
7
8
func Clone(s string) string {
 if len(s) == 0 {
  return ""
 }
 b := make([]byte, len(s))
 copy(b, s)
 return *(*string)(unsafe.Pointer(&b))
}
  • Make a copy of the original string with the copy function to get a new copy of the []byte data.
  • Pointer operation via *(*string)(unsafe.Pointer(&b)) to achieve a zero memory copy conversion from byte to string.

This solves the two problems in the background in a very clever way. You don’t have to keep writing similar code over and over again.