Let’s look at the problematic code, similar string and []byte conversion code is very common on the web.

1
2
3
4
5
6
7
8
func StringToSliceByte(s string) []byte {
	l := len(s)
	return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
		Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
		Len:  l,
		Cap:  l,
	}))
}

The reason why people don’t want to convert a string to a []byte directly via []byte(string) is because that would involve a memory copy, whereas a type conversion via unsafe.Pointer does not involve a memory copy, thus improving performance.

Is there a problem with this code or not? Actually, when I copied the code into vscode, I was prompted with

SliceHeader is the runtime representation of a slice. It cannot be used safely or portably and its representation may change in a later release. Moreover, the Data field is not sufficient to guarantee the data it references will not be garbage collected, so programs must keep a separate, correctly typed pointer to the underlying data.

Firstly, reflect.SliceHeader is used as a runtime representation of a slice, which may change later, and there is a risk of using it directly; secondly, there is no guarantee that the data it points to will not be rubbish collected by GC.

The first problem is fine, but the latter, the GC problem, is a big one! Why there is a GC problem, let’s look at the definitions of reflect.SliceHeader and reflect.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
type SliceHeader struct {
	Data uintptr
	Len  int
	Cap  int
}

type StringHeader struct {
	Data uintptr
	Len  int
}

As you can see above, Data is of type uintptr, and although it has a ptr suffix, it is still essentially an integer, not a pointer, which means that it does not hold the data it points to, so the data may be reclaimed by GC.

Now that we know the cause and effect, let’s construct a piece of code that demonstrates that there is a GC problem.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package main

import (
	"fmt"
	"reflect"
	"runtime"
	"unsafe"
)

func main() {
	fmt.Printf("%s\n", test())
}

func test() []byte {
	defer runtime.GC()
	x := make([]byte, 5)
	x[0] = 'h'
	x[1] = 'e'
	x[2] = 'l'
	x[3] = 'l'
	x[4] = 'o'
	return StringToSliceByte(string(x))
}

func StringToSliceByte(s string) []byte {
	l := len(s)
	return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
		Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
		Len:  l,
		Cap:  l,
	}))
}

Note: Dynamic strings are used because static strings are stored in the TEXT area and will not be reclaimed by the GC.

When we run the above code, it does not output hello, but gibberish, because the corresponding data has already been reclaimed by the GC, if we remove runtime.GC() and run it again, the output will probably be normal again.

This shows that because Data is of type uintptr, any assignment to it is unsafe. This should have been the end of the matter, but the unsafe.Pointer documentation happens to have an example of a direct assignment to Data: Conversion of a reflect.SliceHeader or reflect.Pointer.

1
2
3
4
var s string
hdr := (*reflect.StringHeader)(unsafe.Pointer(&s))
hdr.Data = uintptr(unsafe.Pointer(p))
hdr.Len = n

Whether the documentation is wrong or our inference is wrong, continue to read the documentation which states.

the reflect data structures SliceHeader and StringHeader declare the field Data as a uintptr to keep callers from changing the result to an arbitrary type without first importing “unsafe”. However, this means that SliceHeader and StringHeader are only valid when interpreting the content of an actual slice or string value.

That is, the SliceHeader or StringHeader is only valid if it operates on a slice or string that actually exists, and recall that the original code did not comply with unsafe.Pointer because there was no actual slice when it operated on reflect. Pointer (golang-nuts), adjust it as required.

Running it through the test code again, the output is now normal. However, some people may ask, didn’t we say before that uintptr is not a pointer and can’t prevent data from being reclaimed by GC, but why does GC have no effect? In fact, this is because the compiler did special handling on the *reflect.{Slice,String}Header.

If you want to verify that there is special handling, you can use a custom type reversal to verify that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
type StringHeader struct {
	Data uintptr
	Len  int
}

type SliceHeader struct {
	Data uintptr
	Len  int
	Cap  int
}

func StringToSliceByte(s string) []byte {
	var b []byte
	l := len(s)
	p := (*SliceHeader)(unsafe.Pointer(&b))
	p.Data = (*StringHeader)(unsafe.Pointer(&s)).Data
	p.Len = l
	p.Cap = l
	return b
}

You’ll notice that if you don’t use the type in reflect, then the output doesn’t work again. This backfires and verifies that the compiler is indeed doing something special with *reflect.{Slice,String}Header.

Now that we’ve basically figured out the pitfalls of string and []byte conversions, here’s a look at how to write accurate conversion code, although the compiler plays a few tricks with it, but we shouldn’t rely on these underhanded operations.

Since uintptr is not a pointer, let’s use unsafe.Pointer instead, so that the data is not reclaimed by GC.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
type StringHeader struct {
	Data unsafe.Pointer
	Len  int
}

type SliceHeader struct {
	Data unsafe.Pointer
	Len  int
	Cap  int
}

func StringToSliceByte(s string) []byte {
	var b []byte
	l := len(s)
	p := (*SliceHeader)(unsafe.Pointer(&b))
	p.Data = (*StringHeader)(unsafe.Pointer(&s)).Data
	p.Len = l
	p.Cap = l
	return b
}

The above code is slightly bloated, for a simpler way of writing it see gin or fasthttp for implementations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// gin
func StringToBytes(s string) []byte {
	return *(*[]byte)(unsafe.Pointer(
		&struct {
			string
			Cap int
		}{s, len(s)},
	))
}

func BytesToString(b []byte) string {
	return *(*string)(unsafe.Pointer(&b))
}

// fasthttp
func s2b(s string) (b []byte) {
	/* #nosec G103 */
	bh := (*reflect.SliceHeader)(unsafe.Pointer(&b))
	/* #nosec G103 */
	sh := (*reflect.StringHeader)(unsafe.Pointer(&s))
	bh.Data = sh.Data
	bh.Cap = sh.Len
	bh.Len = sh.Len
	return b
}

func b2s(b []byte) string {
	/* #nosec G103 */
	return *(*string)(unsafe.Pointer(&b))
}

At this point, we have solved the problem of converting string and []byte perfectly.