A little research on the connection reuse problem caused by unread http.Response.Body

In the source code of Golang’s HTTP library, the description of http.Response.Body is as follows.

// Body represents the response body.
//
// The response body is streamed on demand as the Body field
// is read. If the network connection fails or the server
// terminates the response, Body.Read calls return an error.
//
// The http Client and Transport guarantee that Body is always
// non-nil, even on responses without a body or responses with
// a zero-length body. It is the caller's responsibility to
// close Body. The default HTTP client's Transport may not
// reuse HTTP/1.x "keep-alive" TCP connections if the Body is
// not read to completion and closed.
//
// The Body is automatically dechunked if the server replied
// with a "chunked" Transfer-Encoding.
//
// As of Go 1.12, the Body will also implement io.Writer
// on a successful "101 Switching Protocols" response,
// as used by WebSockets and HTTP/2's "h2c" mode.
Body io.ReadCloser

Notice the sentence: The default HTTP client’s Transport may not reuse HTTP/1.x “keep-alive” TCP connections if the Body is not read to completion and closed. It says, The default HTTP client’s Transport may not reuse HTTP/1.x “keep-Alive” connections if the Body is not read to completion and closed. (In the days of HTTP/1.0, “Keep-Alive” was not the default behavior; if the browser and server support it, you can add “Connection: Keep-Alive” to the request and response headers. if the browser and server support it. However, in HTTP/1.1, “Keep-Alive” is the default behavior unless it is specifically disallowed (e.g. Connection: close) ^.

Why do we need connection multiplexing? Based on performance considerations, of course!

The way we write code every day

And looking at the code we normally write, few people seem to notice this (the case of just sending HTTP requests, no data required).

// 注意是 HTTP(80) 端口的连接，大量用于内网
resp, err := http.Get(`http://www.example.com`)
if err != nil {
	panic(err)
}
defer resp.Body.Close()

If you just forget Body.Close(), chances are that linter will not pass. But no one reminds you that Body should not be read away. According to the documentation, if Body is not read, the connection will not be reused. Why? I think it’s very simple: because the HTTP client has no way of knowing whether you’ll use the Body again. It wouldn’t dare read it away for you (to complete a Request/Response) to actively reuse the connection, after all, in case the network suddenly gets stuck and reading the Body takes up a lot of time. It would have been quick to close the connection, but it would have resulted in a bunch of unusable and unclosed connections because of the reuse. This is not desirable in the design of the standard library.

Request/Response: After the server reads the headers of the request or the client reads the headers of the response, the request or response is read, and the body is wrapped in a ReadCloser interface as a stream to be read (and closed) by the program itself later. This is very nice, as it does not take up a lot of memory because the request or response contains a very large Body.

Writing styles that should be considered

In summary, the documentation’s claim about the phenomenon of connection reuse is a given. So shouldn’t we always execute a io.Copy(ioutil.Discard, resp.Body) where Body is not needed in order to read through Body?

Just like defer resp.Body.Close() needs to always remember to close.

I’m not sure of the answer: if you can be sure that the interface is very fast and responds with little or no data, then it’s conceivable that there’s no performance loss if you execute it. But in case the network gets stuck, then it will take a long time to read the Body. So, on-demand. If you can write your code with this in mind (or add a comment to it), it’s also a sign of greater rigor.

An example for verifying connection reuse

I wrote a test code to verify this behavior.

package main

import (
	"flag"
	"io"
	"io/ioutil"
	"net/http"
)

func issue(discard bool) {
	resp, err := http.Get(`http://www.example.com`)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()
	if discard {
		io.Copy(ioutil.Discard, resp.Body)
	}
}

func main() {
	var discard bool
	flag.BoolVar(&discard, `d`, false, `discard body`)
	flag.Parse()

	issue(discard)
	issue(true) // whatever
}

Like the following, when the Body call is not executed.

1
2
3

$ sudo strace -qqfe connect ./discard -d=false 2>&1 | grep '(80)'
[pid 23647] connect(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("93.184.216.34")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 23649] connect(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("93.184.216.34")}, 16) = -1 EINPROGRESS (Operation now in progress)

Multiple executions show the same result: 2 connects are initiated to establish a connection. Does this mean that the connection is not being reused? I think so. Although the file descriptor is 6 both times, that’s because the connection was closed each time and, as is characteristic of Unix, the file descriptor always starts with the least used.

And when called with Body.

1
2

$ sudo strace -qqfe connect ./discard -d=true 2>&1 | grep '(80)'
[pid 23743] connect(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("93.184.216.34")}, 16) = -1 EINPROGRESS (Operation now in progress)

We only see 1 connect call. Does this already mean that the connection is actually being reused? I think so.

Table of Contents

The way we write code every day

Writing styles that should be considered

An example for verifying connection reuse