The TCP protocol contains 11 different states, and the TCP connection transitions states based on the messages sent or received. The state machine shown below illustrates all possible transitions, including not only the state transition process under normal conditions, but also the state transition under abnormal conditions.

Both parties communicating using the TCP protocol will trigger the TIME_WAIT state when closing the connection. The operation of closing the connection actually tells the other party to the communication that it has no data to send, but it still maintains the ability to receive data from the other party, a common process of closing the connection is as follows.

  1. when the client has no data to send, it sends a FIN message to the server, and after sending the message it enters the FIN_WAIT_1 state.
  2. when the server receives a FIN message from the client, it will enter the CLOSE_WAIT state and send an ACK message to the client, and the client will enter the FIN_WAIT_2 state when it receives the ACK message.
  3. the server sends a FIN message to the client when there is no data to be sent on the server side.
  4. when the client receives a FIN message, it enters the TIME_WAIT state and sends an ACK message to the server, which receives it and enters the CLOSED state.
  5. the client also enters the CLOSED state after waiting for two maximum segment lifetime (Maximum segment lifetime (MSL)) times.

From the above, we can see that TIME_WAIT appears only on the actively disconnected side, while the passively disconnected side goes directly to the CLOSED state, and the client that enters TIME_WAIT needs to wait for 2 MSL before it can actually close the connection. The reason why the TCP protocol requires the TIME_WAIT state is the same as the reason why a client needs to wait for two MSLs before it can directly enter the CLOSED state.

  • prevent delayed data segments from being received by other TCP connections using the same source address, source port, destination address, and destination port.
  • guaranteeing that a TCP connection is properly closed remotely, i.e., waiting for the ACK message corresponding to FIN to be received by the party passively closing the connection.

Both of the above reasons are relatively simple, so let’s expand on some of the possible problems behind them.

Blocking Delayed Data Segments

Each TCP data segment contains a unique sequence number. This sequence number ensures the reliability and sequential nature of the TCP protocol, and without regard to sequence number overflow zeroing, sequence number uniqueness is an important convention in the TCP protocol that can cause confusing phenomena and results when this rule is violated. To ensure that the data segment of a new TCP connection does not duplicate the data segment of a historical connection still in transit on the network, a TCP connection needs at least the maximum time that a silent data segment can survive on the network before a new sequence number is assigned, i.e., MSL

To be sure that a TCP does not create a segment that carries a sequence number which may be duplicated by an old segment remaining in the network, the TCP must keep quiet for a maximum segment lifetime (MSL) before assigning any sequence numbers upon starting up or recovering from a crash in which memory of sequence numbers in use was lost.

In the TCP connection shown above, the SEQ = 301 message sent by the server is not received until after the TCP connection is closed due to network delays; the SEQ = 301 message is sent to the client when a TCP connection using the same port number is reused, yet this expired message may be received normally by the client, which poses a more serious problem, so we should be very careful when adjusting the TIME_WAIT policy and must be clear about what we are doing.

RFC 793 states that TCP connections need to wait 2 times the MSL in TIME_WAIT, but it does not explain where the double comes from.

The RFC 793 documentation sets the MSL time to 120 seconds, or two minutes, however this is not a tightly extrapolated value, but rather an engineering choice, and there is no problem if we are asked to change the OS settings based on the service’s historical experience; in fact, earlier versions of Linux started setting the wait time for TIME_WAIT TCP _TIMEWAIT_LEN to 60 seconds in order to more quickly reuse TCP connection resources

1
2
#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
				  * state, about 60 seconds	*/

On Linux, clients can establish connections to remote servers using port numbers 32,768 to 61,000, for a total of 28,232 port numbers, and applications can choose from any of nearly 30,000 port numbers.

1
2
$ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768 61000

However, if the host has created more than 28,232 TCP connections to a specific port on the target host in the last minute, then an error will occur if a new TCP connection is created, which means that if we do not adjust the host’s configuration, then the maximum number of TCP connections that can be created per second is ~470

Guaranteeing Connection Closure

From the definition of the TIME_WAIT state in RFC 793, we can find another important role for this state, waiting long enough to make sure that the remote TCP connection has received the ACK corresponding to its outgoing termination message FIN.

TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.

If the client does not wait long enough to re-establish a TCP connection with the server when the server has not received the ACK message, this will cause the following problem - the server will still consider the current connection as legitimate because it has not received the ACK message, and the client will receive an RST message from the server when it resends the SYN message to request a handshake, and the connection establishment process will be terminated.

By default, if the client waits long enough it will encounter either

  1. the server receives the ACK message normally and closes the current TCP connection.
  2. the server does not receive the ACK message, resends FIN to close the connection and waits for a new ACK message.

As long as the client waits for 2 MSL, the connection between the client and the server is closed normally, and the probability that a newly created TCP connection will be affected is negligible, ensuring the reliability of data transmission.

Summary

There are some scenarios where a 60-second wait for destruction is really unacceptable, e.g., highly concurrent stress tests. When we test the throughput and latency of a remote service with concurrent requests, a large number of TCP connections in the TIME_WAIT state can be generated locally, and active connections can be viewed on macOS using the command shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ netstat -tan
Active Internet connections (including servers)
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0      0  192.168.50.109.51284   47.95.49.174.443       TIME_WAIT
tcp4       0      0  192.168.50.109.51275   47.95.49.174.443       TIME_WAIT
...
tcp4       0      0  192.168.50.109.51273   203.107.32.116.443     TIME_WAIT
tcp4       0      0  192.168.50.109.51293   203.107.32.116.443     TIME_WAIT
tcp4       0      0  192.168.50.109.51297   203.107.32.116.443     TIME_WAIT
...

When we stress test the server with thousands of concurrent connections on the host, these connections for stress testing will quickly consume the TCP connection resources on the host and almost all TCP will be in TIME_WAIT state waiting to be destroyed. If we do encounter a situation where we have to deal with a TIME_WAIT state on a single machine, then this can be handled in several ways.

  1. use the SO_LINGER option and set the staging time l_linger to 0. At this point, if we close the TCP connection, the kernel will simply discard all the data in the buffer and send a RST message to the server to directly terminate the current connection.
  2. use the net.ipv4.tcp_tw_reuse option to allow the kernel to reuse TCP connections that are in the TIME_WAIT state via the TCP timestamp option.
  3. modify the available port range in the net.ipv4.ip_local_port_range option to increase the maximum number of TCP connections that can co-exist.

Note that another common TCP configuration item, net.ipv4.tcp_tw_recycle, has been removed in Linux 4.12, so we can no longer This configuration solves the problems caused by the TIME_WAIT design.

The TIME_WAIT state of TCP plays a very important role as it is an indispensable part of the TCP protocol reliability design, and if it can be solved by adding machines, then we need to understand the design rationale behind it and avoid modifying the default configuration as much as possible, as the Linux manual says, when modifying these configurations Here, let’s revisit the reason for the TIME_WAIT state in the TCP protocol, which causes the following problems when re-establishing a connection to a remote using the same port number if the client is not waiting long enough.

  • Because the network transmission time of a data segment is uncertain, it may receive a data segment that was not received on the last TCP connection.
  • Because the ACK sent by the client may not have been received by the server, the server may still be in the LAST_ACK state, so it will reply with a RST message to terminate the establishment of a new connection.

The TIME_WAIT state is the result of TCP’s struggle with uncertain network latency, and uncertainty is the biggest impediment to the TCP protocol on the road to reliability. To conclude, let’s look at some more open-ended related issues, and the interested reader can ponder the following questions.

  • How does the net.ipv4.tcp_tw_reuse configuration guarantee the relative security of reused TCP connections via timestamps?
  • Why was the net.ipv4.tcp_tw_recycle configuration removed from the protocol stack by Linux?