Go 1.18 introduced the new library netaddr to represent IP addresses and related operations. Its author, Brad Fitzpatrick, wrote a special blog about the design principles and final implementation of this library.

The main feature of this implementation relies on the library intern.Value. Here are some of my research and observations on this library

The design principle of netaddr is to have a type that can support IPv4, region-free IPv6 and region-free IPv6 at the same time, and to have a value type that can be compared correctly using == and with the smallest possible memory footprint. This is a very difficult requirement. You can refer to the library author’s blog for the detailed design process.

The final implementation results.

1
2
3
4
type IP struct {
  addr uint128
  z    *intern.Value // zone and family
}

where addr is used to hold the actual IP address (in the case of IPv4, only the lower 32 bits are used), and z is used as a flag bit to distinguish between IPv4, region-free IPv6, and region-capable IPv6, as well as to record region information. Since the area information can be any string, a correct implementation requires that intern.Value points to the same address when the string has the same content.

Here z is not used as a string, I guess to try to compress the size of the IP structure. go a string will take up a fixed 16byte (an internal pointer to []byte, an int table is the length of the string), which is twice as big as a pointer 8byte. But using strings would make the implementation easier to understand.

1
2
3
4
type IP struct {
    addr uint128
    z    string // "4" for IPv4, "6" for IPv6 without zone, "6eth0" for IPv6 with zone 'eth0'.
}

Besides the 8byte more than the original structure, it also achieves the rest of the goals.

Here’s a look at how intern.Value achieves the same functionality while saving 8byte. According to the function points to the same address when strings with the same content, a very straightforward implementation would look like this.

1
2
3
4
5
6
7
var values = map[string]*string
func Get(s string) *string {
  if _, ok := values[s]; !ok {
    values[s] = &s
  }
  return values[s]
}

Without considering concurrency, the biggest problem with this implementation is memory leaks. All pointers returned by Get are persistently referenced by values. To solve the memory leak, you need to bring out the unsafe library. This is very close to the implementation of intern.Value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
var valMap  = map[key]uintptr{}
type Value struct {
  s           string
  resurrected bool
}
func Get(s string) *Value {
  var v *Value
  if addr, ok := valMap[k]; ok {
    v = (*Value)((unsafe.Pointer)(addr))
    v.resurrected = true
  }
  if v != nil {
    return v
  }
  v = &Value{
    s: s,
    resurrected: true,
  }
  // SetFinalizer before uintptr conversion (theoretical concern;
  // see https://github.com/go4org/intern/issues/13)
  runtime.SetFinalizer(v, finalize)
  valMap[k] = uintptr(unsafe.Pointer(v))
  return v
}
func finalize(v *Value) {
  if v.resurrected {
    // 程序在这次GC时引用过v
    // 下轮GC再检查
    v.resurrected = false
    runtime.SetFinalizer(v, finalize)
    return
  }
  delete(valMap, v.s)
}

valMap does not reference Value, it just records the address of Value in unsafe.Pointer. When all external references to Value expire, the GC process triggers finalize to do the check. If Value has not been referenced after two rounds of finalize, the corresponding record address is removed from valMap. Value will be deleted in the next GC process (since there is no finalize attached this time).

If you add concurrency-protected locks, it’s pretty much the same as the implementation of intern. Value also takes into account the case of non-string values.

The reason it is so problematic here is that == can only do one level of instance value comparison and is not customizable. This unsafe exchange is probably tolerable considering the problems associated with customizing ==.

One vulnerability is that if an external program also records the address of a Value via unsafe.Pointer, it is possible that after some time the address of the Value with the same content will change.

To be honest, I don’t like the implementation of intern.Value. Maybe the underlying library really lacks the 8byte size.