Zero-copy techniques are also used extensively in the Go standard library to improve performance. Since many zero-copy related techniques are provided through system calls, these system calls are also encapsulated in the Go standard library, and the related encapsulated code can be found in internal/poll.

Let’s take Linux as an example, after all, most of our applications are running on Linux.

sendfile

The sendfile system call is encapsulated in the internal/poll/sendfile_linux.go file, and I removed part of the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// SendFile wraps the sendfile system call.
func SendFile(dstFD *FD, src int, remain int64) (int64, error) {
    ...... //lock
    dst := dstFD.Sysfd
    var written int64
    var err error
    for remain > 0 {
        n := maxSendfileSize
        if int64(n) > remain {
            n = int(remain)
        }
        n, err1 := syscall.Sendfile(dst, src, nil, n)
        if n > 0 {
            written += int64(n)
            remain -= int64(n)
        } else if n == 0 && err1 == nil {
            break
        }
        ...... // Error Handling 
    }
    return written, err
}

You can see that SendFile calls senfile to write data in bulk. The sendfile system call will transfer up to 0x7ffff00 (2147479552) bytes of data at a time. Here golang sets maxSendfileSize to 4194304 bytes.

It is used in the net/sendfile_linux.go file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func sendFile(c *netFD, r io.Reader) (written int64, err error, handled bool) {
    var remain int64 = 1 << 62 // by default, copy until EOF
    lr, ok := r.(*io.LimitedReader)
    ......
    f, ok := r.(*os.File)
    if !ok {
        return 0, nil, false
    }
    sc, err := f.SyscallConn()
    if err != nil {
        return 0, nil, false
    }
    var werr error
    err = sc.Read(func(fd uintptr) bool {
        written, werr = poll.SendFile(&c.pfd, int(fd), remain)
        return true
    })
    if err == nil {
        err = werr
    }
    if lr != nil {
        lr.N = remain - written
    }
    return written, wrapSyscallError("sendfile", err), written > 0
}

And who will call this function? It’s TCPConn.

1
2
3
4
5
6
7
8
9
func (c *TCPConn) readFrom(r io.Reader) (int64, error) {
    if n, err, handled := splice(c.fd, r); handled {
        return n, err
    }
    if n, err, handled := sendFile(c.fd, r); handled {
        return n, err
    }
    return genericReadFrom(c, r)
}

This method in turn will be encapsulated by the ReadFrom method. Remember this ReadFrom method, we’ll talk about it later.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func (c *TCPConn) ReadFrom(r io.Reader) (int64, error) {
    if !c.ok() {
        return 0, syscall.EINVAL
    }
    n, err := c.readFrom(r)
    if err != nil && err != io.EOF {
        err = &OpError{Op: "readfrom", Net: c.fd.net, Source: c.fd.laddr, Addr: c.fd.raddr, Err: err}
    }
    return n, err
}

The implementation of TCPConn.readFrom method is interesting. It first checks if the zero-copy optimization using the slice system call is satisfied, and only if the destination is a TCP connection, the source is TCP or a Unix connection can slice be called. Otherwise it tries to use sendfile, and if it wants to use sendfile optimization, there is a restriction that the source is an *os.File file. Otherwise, use a different copy method.

When will ReadFrom be called? In fact, you will often use it, and io.Copy will call ReadFrom. Maybe inadvertently you use zero-copy when you write a file to a socket. Of course this is not the only way it is called and used.

If we look at a call chain, we will get the pulse clear: io.Copy -> *TCPConn.ReadFrom -> *TCPConn.readFrom -> net.sendFile -> poll.sendFile .

splice

As you can see above, *TCPConn.readFrom is initially an attempt to use splice, and the scenarios and limitations of its use are also mentioned. The net.splice function is actually a call to poll.Splice:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
func Splice(dst, src *FD, remain int64) (written int64, handled bool, sc string, err error) {
    p, sc, err := getPipe()
    if err != nil {
        return 0, false, sc, err
    }
    defer putPipe(p)
    var inPipe, n int
    for err == nil && remain > 0 {
        max := maxSpliceSize
        if int64(max) > remain {
            max = int(remain)
        }
        inPipe, err = spliceDrain(p.wfd, src, max)
        handled = handled || (err != syscall.EINVAL)
        if err != nil || inPipe == 0 {
            break
        }
        p.data += inPipe
        n, err = splicePump(dst, p.rfd, inPipe)
        if n > 0 {
            written += int64(n)
            remain -= int64(n)
            p.data -= n
        }
    }
    if err != nil {
        return written, handled, "splice", err
    }
    return written, true, "", nil
}

So you see, inadvertently you will use splice or sendfile.

CopyFileRange

copy_file_range_linux.go wraps the copy_file_range system call. Since this system call is very new, it is important to first check the Linux version to see if it is supported when wrapping it. We’ll skip the version check and the code to call the bulk copy and see how this system call is used.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
func copyFileRange(dst, src *FD, max int) (written int64, err error) {
    if err := dst.writeLock(); err != nil {
        return 0, err
    }
    defer dst.writeUnlock()
    if err := src.readLock(); err != nil {
        return 0, err
    }
    defer src.readUnlock()
    var n int
    for {
        n, err = unix.CopyFileRange(src.Sysfd, nil, dst.Sysfd, nil, max, 0)
        if err != syscall.EINTR {
            break
        }
    }
    return int64(n), err
}

Where will it be used? os.File when reading data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
var pollCopyFileRange = poll.CopyFileRange
func (f *File) readFrom(r io.Reader) (written int64, handled bool, err error) {
    // copy_file_range(2) does not support destinations opened with
    // O_APPEND, so don't even try.
    if f.appendMode {
        return 0, false, nil
    }
    remain := int64(1 << 62)
    lr, ok := r.(*io.LimitedReader)
    if ok {
        remain, r = lr.N, lr.R
        if remain <= 0 {
            return 0, true, nil
        }
    }
    src, ok := r.(*File)
    if !ok {
        return 0, false, nil
    }
    if src.checkValid("ReadFrom") != nil {
        // Avoid returning the error as we report handled as false,
        // leave further error handling as the responsibility of the caller.
        return 0, false, nil
    }
    written, handled, err = pollCopyFileRange(&f.pfd, &src.pfd, remain)
    if lr != nil {
        lr.N -= written
    }
    return written, handled, NewSyscallError("copy_file_range", err)
}

The same is true for the *FIle.ReadFrom call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func (f *File) ReadFrom(r io.Reader) (n int64, err error) {
    if err := f.checkValid("write"); err != nil {
        return 0, err
    }
    n, handled, e := f.readFrom(r)
    if !handled {
        return genericReadFrom(f, r) // without wrapping
    }
    return n, f.wrapErr("write", e)
}

So this optimization is used in file copying, the general call link is io.Copy -> *File.ReadFrom -> *File.readFrom -> poll.CopyFileRange -> poll.copyFileRange.

Zero-copy in the standard library

The Go standard library encapsulates the zero-copy technology at the bottom, so many times you are unaware of it. Let’s say you implement a simple file server.

1
2
3
4
5
import "net/http"
func main() {
    http.Handle("/", http.StripPrefix("/static/", http.FileServer(http.Dir("../root.img"))))
    http.ListenAndServe(":8972", nil)
}

Call chain: http.FileServer -> *fileHandler.ServeHTTP -> http.serveFile -> http.serveContent -> io.CopyN -> io.Copy -> sendFile call chain.

You can see that sendFile is called when accessing the file.

Third-party libraries

There are several libraries that provide a wrapper for sendFile/splice.

Because it’s easy to call system calls directly, we can often imitate the standard library to implement our own zero-copy methods.

So I personally feel that there is not much icing on the cake for these traditional methods, and all we need to do is to encapsulate or customize the development of new zero-copy system interfaces.