HTTP Protocol Basics

The HTTP protocol is an application layer protocol that was generally implemented based on the TCP protocol prior to HTTP/3. Because TCP protocol is a reliable streaming communication protocol, after a connection is established, both senders and receivers can send data of any length, and the TCP stack may also do slicing of the data. So TCP-based application layer protocols need to negotiate the message transmission format so that both senders and receivers can extract the complete message from the received data stream. The HTTP protocol is one of the many conventions. In short, TCP is the transport layer protocol that provides streaming communication, and HTTP defines the message format.

File downloading is the simplest of the HTTP protocols, and that’s what it was originally intended to do. A typical download request looks like this.

1
2
3
4
GET /index.html HTTP/1.1\r\n
Host: taoshu.in\r\n
User-Agent: httpie/1.0\r\n
\r\n

All content is in plain ASCII characters, with \r\n indicating a line. The first line is called the request line and includes three parts: request method, path and protocol version. Each line after the request line represents a set of headers, which in turn wrap names and values, separated by colons. The last line \r\n indicates the completion of the header message transmission.

The server side receives the GET request and needs to resolve the file path from it, and then send the corresponding content to the client. However, before transmitting the data, the server side needs to send the current response status.

1
2
3
4
5
HTTP/1.1 200 OK\r\n
Content-Type: plain/text\r\n
Content-Length: 5\r\n
\r\n
hello

The first line is the status line, which includes the version number, status code and status message. the HTTP protocol specifies a series of status codes, 2XX for success, 4XX for client-side errors and 5XX for server-side errors. The status line is also followed by the header information, which is the same as the request message. The empty line is followed by the actual content of the file to be transferred.

HTTP/1.1 reuses the underlying TCP connection by default, so the client needs to determine the length of the file, otherwise the client will keep waiting for the server to send the data. There are two ways to determine the length of a file. The first one is simpler and is specified directly using the Content-Length header. However, sometimes the server is not sure of the total length of the data when transferring the file, so HTTP/1.1 supports chunked transfer encoding. In short, it is a segmented transfer, where the length of the segment is passed before the data is transferred. The last segment has a length value of zero to indicate the end of the data transfer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
HTTP/1.1 200 OK\r\n
Content-Type: plain/text\r\n
Transfer-Encoding: chunked\r\n
\r\n
2\r\n
he\r\n
3\r\n
llo\r\n
0\r\n
\r\n

Segmented transfers are specified using the Transfer-Encoding: chunked header. So the data segments are also in rows, with \r\n appended to the end. In the example above, hello is split into he and llo segments for transmission. The final 0\r\n\r\n indicates that the length of this segment is zero, i.e., the end of data transmission.

To save bandwidth, the HTTP protocol supports compression of data. But only if the server supports the same compression algorithm as the client. When requesting a file, the client sends the currently supported compression algorithms through the Accept-Encoding header, usually gzip and deflate, with multiple algorithms separated by commas. The server extracts the header information and selects one of the supported compression data. The specific compression algorithm needs to be specified by the Content-Encoding header.

1
2
3
4
5
6
HTTP/1.1 200 OK\r\n
Content-Type: plain/text\r\n
Content-Length: 5\r\n
Content-Encoding: gzip\r\n
\r\n
[gzip binary data]

In this case, the Content-Length header specifies the total length of the compressed data.

The Content-Type indicates the data type, which takes the value of MIME type and needs to be judged by the file content. Common types are plain/text, plain/html, image/png, etc.

The above HTTP knowledge used in this article, the following I will introduce the basic knowledge of network programming in Go language.

Network Programming Basics

Generally speaking network programming is said to be TCP/IP network programming. Because IP has only network addresses, packets can be sent from one machine to another. But which program should be given to the receiver to process the data after it is received? That’s where the TCP protocol comes into play. Specifically, the TCP protocol defines the concept of ports based on the IP protocol. Both communicating parties must not only determine the IP address but also specify the port number before sending data. After receiving the IP data, the operating system finds the corresponding process based on the port and hands it over for processing.

Since the communication is duplex and both parties are the sender and receiver of each other, both parties have to bind the port number. A TCP session contains four parts: source address, source port, destination address, and destination port, also called a 4-tuple, and generally the server program needs to specify its own port. It is also called the listening port, otherwise the client does not know how to connect. And the client’s port number is usually assigned automatically by the operating system.

The first API for network programming is listening to ports.

1
2
3
import "net"

ln, err := net.Listen("tcp", "0.0.0.0:8080")

This function is provided by the Listen function of the net module. The first argument indicates the protocol type, in this section only "tcp" is used. The second parameter indicates the address and port to bind to. Where 0.0.0.0 indicates all IP addresses bound to the current device. A device may have multiple NICs, multiple addresses on a single NIC, and special addresses like 127.0.0.1. For external services, it is easiest to bind to all addresses, i.e. 0.0.0.0. This will allow you to handle data sent from any address. The number 8080 after the colon is the port to bind to. Note that the program cannot listen to ports in the [0-1024] range without administrator privileges.

The net.Dial function is required for clients to connect to the server. Because this section only talks about the server side, the client side uses the ready-made curl, so we will not expand on it.

net.Listen returns the net.Listener interface object. The most important function of this interface is Accept(). This function is called by the server and is hung until a client completes a TCP handshake with the server and is then woken up.

1
c, err := ln.Accept()

The Accept function returns the net.Conn interface. This interface is slightly more complicated.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
type Conn interface {
  Read(b []byte) (n int, err error)
  Write(b []byte) (n int, err error)
  Close() error
  LocalAddr() Addr
  RemoteAddr() Addr
  SetDeadline(t time.Time) error
  SetReadDeadline(t time.Time) error
  SetWriteDeadline(t time.Time) error
}

Programs can send and receive data through the Read and Write functions. The Close function is used to close the connection. The LocalAddr and RemoteAddr functions are used to get the 4-tupe information of the TCP connection. The last three functions are used to set the timeout time.

Timeout control is a very important topic in network programming. If a reasonable timeout is not set, a malicious client will create a large number of TCP connections in bulk and not send or receive data itself, eventually exhausting the server’s resources, which is a typical denial-of-service attack (DDoS).

The timeout control in Go is rather special. It requires the program to calculate an absolute time, which is the current time plus the timeout interval, and then pass it to the operating system. If the read/write is not completed by the deadline, the Read or Write function call will return a timeout error.

The simplest network program is called Echo, which sends the received data back to the client unchanged.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
package main
import "net"
func main() {
  ln, _ := net.Listen("tcp", "0.0.0.0:8080")
  for {
    c, _ := ln.Accept()
    buf := make([]byte, 1024)
    for {
      n, _ := c.Read(buf)
      if n == 0 {
        break
      }
      c.Write(buf[:n])
    }
  }
}

This program omits all error handling logic, but the actual code should rigorously check for errors and handle them accordingly!

After running, the program will listen to port 8080. You can execute telnet 127.0.0.1 8080, then output any content and enter, and telnet will receive the same content back from the service.

This program omits all error handling logic, but the actual code should rigorously check for errors and handle them accordingly!

After running, the program will listen to port 8080. You can execute telnet 127.0.0.1 8080, then output any content and enter, and telnet will receive the same content back from the service.

If you run two telnet processes at the same time, you will find that the second one is blocked. This problem arises because the Echo code above is handled by only one Goroutine. As long as the first telnet does not disconnect, the inner for loop will not exit. Speaking of exiting, the Read function will also return if the telnet actively disconnects, but the first return value is zero, indicating that the client actively disconnected. If the inner loop is not decoupled, the Accept function in the outer loop will not have a chance to execute. So the second telnet cannot establish a TCP connection properly.

The solution is also very simple - use a Goroutine. every time the Accept function returns, a new Goroutine is created to run the inner loop, and the current Goroutine continues to call Accept to wait for the next incoming connection. I won’t post the code, so you can try it out for yourself as a little exercise. You can also try to set a timeout for incoming connections to see if the connection will be disconnected by then.

By now the network-related basics have been introduced. Let’s start designing the HTTP file server program.

Overall Design

We want the application to support the following functions.

  1. downloading files via GET requests
  2. compress data via Gzip
  3. transfer large files using chunked transfer codes
  4. record access logs

For the concurrency model we use the simplest multi-Goroutine pattern. The concurrent process where the main function is located (the main Goroutine) is responsible for listening to the port and recursively calling Accept to accept incoming connections from the client. Then a separate working Goroutine is started for each connection to handle the HTTP request. This approach, while classic, is a bit too simple. To further demonstrate concurrent programming, I use a specialized Goroutine to collect and export logs.

The execution of each working Goroutine is as follows.

  • Read and resolve the HTTP request
  • Find the corresponding file information based on the request path
  • Compress the data according to the client’s capabilities
  • Send the HTTP response and file data to the client

Each working Goroutine is completely independent and does not affect each other.

Below we start to detail the key design of each part.

Component design

Protocol Parsing

The most complex aspect of server-side software is protocol parsing. HTTP is a highly scalable protocol that is very flexible to use, but at the cost of being very cumbersome to parse.

We said earlier that TCP is logically a streaming protocol. However, the data is transmitted in segments in the implementation. Simply put, this segmentation can cause the data received at the receiving end to be inconsistent with the data received at the sending end. For example, if the client sends abcde bytes at once, the receiver may receive abc first, and then de. This is of course a rather exaggerated statement. In fact, this problem only arises when more data is sent at once.

There is also a case where the client sends two pieces of data that are received by the server at once, or receives part of the first and second. Suppose the client sends two headers User-Agent: curl/1.0\r\n and Accept-Encoding: gzip\r\n in sequence. However, because of the possible segmentation of the TCP underlay, the server side may receive User-Agent: curl/1.0\r\nAccept-Encoding: . Note that the Accept-Encoding part that follows is incomplete.

In either case, the server side must be compatible. The handling is also very simple and classic, using buffers.

1
2
3
4
5
6
7
8
9
n := 0
buf := make([]byte, 1024)
for {
  n,_ = c.Read(buf[n:])
  r, ok := parse(buf[:n])
  if ok { /*..*/ }
  copy(buf, buf[r:n])
  n = n - r
}

Each time parse ends, it needs to return the offset of the unprocessed data in buf, and the program needs to move the rest of the data to the beginning of the buffer, and then skip this part of the data to continue receiving subsequent data from the client. This means that we cannot simply assume that the server side receives a complete packet at a time.

Parsing HTTP requests

HTTP requests are handled in rows, and there are various ways to handle them. The simplest is to set a relatively large buffer and try to collect all the request data at once. If the parsing fails, the client is considered to have a problem. This method is simple and brutal, but rarely used in practice.

The classic approach is to set a suitable buffer, say 1k bytes. This requires that each line of data should not exceed 1k, otherwise it cannot be parsed. Then iterate through the received data one at a time and use a state machine to record the current starting position of the content to be parsed.

It’s a bit complicated to say, but let’s give an example.

1
GET /index.html HTTP/1.1\r\n

At first we have to extract the request method GET . We can set the start position p to 0 and scan every byte until the first space. Assuming that the current position is represented by i, then buf[p:i] is the request method GET.

Immediately after, we want to skip spaces. A separate state can be set for this behavior. In the current state, when the first non-empty character is scanned, we need to record the current position p and then switch to the state of parsing paths. When another space is scanned, buf[p:i] corresponds to the requested path. And so on, keep switching and scanning, and finally finish parsing.

The general framework of the code is an outer loop with a large switch branching judgment statement inside. Because the HTTP protocol is complex, the state machine needs to set many states. I implemented a simplified version myself, many rules are not judged, but can extract all the information needed for this section normally. Because it is a simple version, only 16 states are set. It is already more complicated for beginners.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
func (req *Request) Feed(buf []byte) (ParseStatus, int) {
  var p, i int
  status := ParseBegin
  if req.status != ParseBegin {
    status = req.status
  }
  var headerName, headerValue string
  for i = 0; i < len(buf); i++ {
    switch status {
    case ParseBegin:
    //...
    case ParseMethod:
    //...
    default:
      status = ParseError
      break
    }
  }
  req.status = status
  return status, i
}

Although the HTTP protocol requires the use of \r\n line breaks, almost all implementations support the use of \n line breaks. Compatibility with this feature also makes the state machine more complex. You must be careful when reading the code.

Transfer encoding

The transfer encoding is mainly used to solve the memory usage problem. If nothing is done, we can perfectly read the file to be sent to memory first and then send it to the client. But if the file is very large, it will take up a lot of memory. With the chunked transfer code, we can use a fixed length buffer to send the file in segments. The core logic is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
buf := make([]byte, 1024)
for {
  n, err := f.Read(buf)
  chunk := buf[:n]
  if n == 0 {
          break
  }
  hexSize := strconv.FormatInt(int64(n), 16)
  w.Write([]byte("\r\n" + hexSize + "\r\n"))
  a.Write(chunk)
}
w.Write([]byte("\r\n0\r\n\r\n"))

Here’s a little trick. Because each piece of data is appended with \r\n . Writing w.Write([]byte("\r\n")) on its own could potentially trigger the sending of additional data, but concatenating it with the preceding data would create a memory copy. So I send the \r\n at the end of the current data along with the length of the next line of data, hence the w.Write([]byte("\r\n "+hexSize+"\r\n")) write.

Content Compression

There are two issues to consider with content compression. First, there are many files that are inherently compressed formats, such as jpeg. compressing them is a pure waste of effort. Generally speaking plain text is better compressed. Second, the size of the file should be considered. Compressing the file will generate additional data information, and if the file itself is relatively short, then compressing it is likely to increase the size.

There is also the problem that lossless compression (such as the Hoffman algorithm) requires reading all the contents of the file beforehand and then generating a compressed dictionary, which requires loading the file into memory at once. The volume is already determined after compression, so there is no need for chunked transfers.

The Go language standard library supports gzip compression, and it is very simple to use.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data, err := io.ReadAll(f)
ctype := http.DetectContentType(data)
gzip := false
if strings.HasPrefix(ctype, "text/") && len(data) > zipSize {
  var buf bytes.Buffer
  zw, _ := gzip.NewWriterLevel(&buf, gzip.DefaultCompression)
  zw.Write(data)
  zw.Close()
  data = buf.Bytes()
  gziped = true
}

You need to call the Close function after writing data using zw, otherwise gzip will not write the compressed data to the buf buffer.

Finally, a word about the logging module.

Logging module

As I said earlier, to demonstrate concurrent programming, I put the logging function into a separate Goroutine. in fact, in real projects, this is the routine operation. Because the system has many requests to process at the same time, writing logs immediately at the end of each request will affect the system performance. So we usually make a logging buffer, and then output it after the buffer is filled.

However, if there are few requests at a certain time, the buffer may not fill up. This would then need to be combined with a periodic refresh of the timer. This behavior of the logging component is particularly well suited to demonstrate the usage of Goroutine.

The core definition of a log is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
type Log struct {
  EntryNum int
  Writer   io.Writer
  Interval time.Duration

  i  int
  ch chan entry
  t  *time.Ticker

  entries []entry
}

The upper-case letters start with EntryNum for the caller to modify the settings. EntryNum denotes the buffer queue length, Writer denotes the real object to output the log, and Interval denotes the interval for the timed refresh.

The core logic of log processing is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
func (l *Log) Loop() {
  for {
    select {
      case e := <-l.ch:
        l.entries[l.i] = e
        l.i++
        if l.i == l.EntryNum {
          l.flush()
        }
      case _ = <-l.t.C:
        l.flush()
    }
  }
}

We need to run the Loop function in a separate Goroutine. It listens to both l.ch and l.t.C channels in the loop via select. Any channel with a message is processed in time. If there are logs, they are saved to the l.entries queue. If the number of logs reaches l.EntryNum it is flushed. if there is not enough log data but the time is up it is also forcibly flushed out. This achieves the effect described above.

The above is a brief introduction to the design ideas of each component. Specific details have to read the source code carefully.

Code Structure

The complete code is hosted on GitHub with the following directory structure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
├── LICENSE
├── README.md
├── cmd
│   └── sfile
│       └── main.go  # main
├── go.mod
├── go.sum
├── http             # HTTP protocol related
│   ├── file.go      # chunked and gzip related
│   ├── file_test.go
│   ├── http.go      # HTTP request parsing
│   └── http_test.go
├── log
│   ├── log.go       # Logging Module
│   └── log_test.go
└── server
    └── server.go    # Server

So there are unit tests added to the core logic and the code is saved in the corresponding *_test.go file. You should also read it carefully.

Summary

The above is the whole content of this article. I hope you can deepen your understanding of the Go language through practice. If you are already familiar with all the code of sfile, you can consider learning Go’s standard library net/http and then try to re-implement sfile with the standard library.

Ref

  • https://taoshu.in/go/go-sfile.html