The TCP protocol contains 11 different states, and the TCP connection transitions states based on the messages sent or received. The state machine shown below illustrates all possible transitions, including not only the state transition process under normal conditions, but also the state transition under abnormal conditions.
Both parties communicating using the TCP protocol will trigger the
TIME_WAIT state when closing the connection. The operation of closing the connection actually tells the other party to the communication that it has no data to send, but it still maintains the ability to receive data from the other party, a common process of closing the connection is as follows.
- when the client has no data to send, it sends a
FINmessage to the server, and after sending the message it enters the
- when the server receives a
FINmessage from the client, it will enter the
CLOSE_WAITstate and send an
ACKmessage to the client, and the client will enter the
FIN_WAIT_2state when it receives the
- the server sends a
FINmessage to the client when there is no data to be sent on the server side.
- when the client receives a
FINmessage, it enters the
TIME_WAITstate and sends an
ACKmessage to the server, which receives it and enters the
- the client also enters the
CLOSEDstate after waiting for two maximum segment lifetime (Maximum segment lifetime (MSL)) times.
From the above, we can see that
TIME_WAIT appears only on the actively disconnected side, while the passively disconnected side goes directly to the
CLOSED state, and the client that enters
TIME_WAIT needs to wait for 2 MSL before it can actually close the connection. The reason why the TCP protocol requires the
TIME_WAIT state is the same as the reason why a client needs to wait for two MSLs before it can directly enter the
- prevent delayed data segments from being received by other TCP connections using the same source address, source port, destination address, and destination port.
- guaranteeing that a TCP connection is properly closed remotely, i.e., waiting for the
ACKmessage corresponding to
FINto be received by the party passively closing the connection.
Both of the above reasons are relatively simple, so let’s expand on some of the possible problems behind them.
Blocking Delayed Data Segments
Each TCP data segment contains a unique sequence number. This sequence number ensures the reliability and sequential nature of the TCP protocol, and without regard to sequence number overflow zeroing, sequence number uniqueness is an important convention in the TCP protocol that can cause confusing phenomena and results when this rule is violated. To ensure that the data segment of a new TCP connection does not duplicate the data segment of a historical connection still in transit on the network, a TCP connection needs at least the maximum time that a silent data segment can survive on the network before a new sequence number is assigned, i.e., MSL
To be sure that a TCP does not create a segment that carries a sequence number which may be duplicated by an old segment remaining in the network, the TCP must keep quiet for a maximum segment lifetime (MSL) before assigning any sequence numbers upon starting up or recovering from a crash in which memory of sequence numbers in use was lost.
In the TCP connection shown above, the
SEQ = 301 message sent by the server is not received until after the TCP connection is closed due to network delays; the
SEQ = 301 message is sent to the client when a TCP connection using the same port number is reused, yet this expired message may be received normally by the client, which poses a more serious problem, so we should be very careful when adjusting the
TIME_WAIT policy and must be clear about what we are doing.
RFC 793 states that TCP connections need to wait 2 times the MSL in
TIME_WAIT, but it does not explain where the double comes from.
The RFC 793 documentation sets the MSL time to 120 seconds, or two minutes, however this is not a tightly extrapolated value, but rather an engineering choice, and there is no problem if we are asked to change the OS settings based on the service’s historical experience; in fact, earlier versions of Linux started setting the wait time for
TCP _TIMEWAIT_LEN to 60 seconds in order to more quickly reuse TCP connection resources
On Linux, clients can establish connections to remote servers using port numbers 32,768 to 61,000, for a total of 28,232 port numbers, and applications can choose from any of nearly 30,000 port numbers.
However, if the host has created more than 28,232 TCP connections to a specific port on the target host in the last minute, then an error will occur if a new TCP connection is created, which means that if we do not adjust the host’s configuration, then the maximum number of TCP connections that can be created per second is ~470
Guaranteeing Connection Closure
From the definition of the
TIME_WAIT state in RFC 793, we can find another important role for this state, waiting long enough to make sure that the remote TCP connection has received the
ACK corresponding to its outgoing termination message
TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.
If the client does not wait long enough to re-establish a TCP connection with the server when the server has not received the ACK message, this will cause the following problem - the server will still consider the current connection as legitimate because it has not received the ACK message, and the client will receive an RST message from the server when it resends the SYN message to request a handshake, and the connection establishment process will be terminated.
By default, if the client waits long enough it will encounter either
- the server receives the
ACKmessage normally and closes the current TCP connection.
- the server does not receive the
FINto close the connection and waits for a new
As long as the client waits for 2 MSL, the connection between the client and the server is closed normally, and the probability that a newly created TCP connection will be affected is negligible, ensuring the reliability of data transmission.
There are some scenarios where a 60-second wait for destruction is really unacceptable, e.g., highly concurrent stress tests. When we test the throughput and latency of a remote service with concurrent requests, a large number of TCP connections in the
TIME_WAIT state can be generated locally, and active connections can be viewed on macOS using the command shown below.
When we stress test the server with thousands of concurrent connections on the host, these connections for stress testing will quickly consume the TCP connection resources on the host and almost all TCP will be in
TIME_WAIT state waiting to be destroyed. If we do encounter a situation where we have to deal with a
TIME_WAIT state on a single machine, then this can be handled in several ways.
- use the
SO_LINGERoption and set the staging time
l_lingerto 0. At this point, if we close the TCP connection, the kernel will simply discard all the data in the buffer and send a
RSTmessage to the server to directly terminate the current connection.
- use the
net.ipv4.tcp_tw_reuseoption to allow the kernel to reuse TCP connections that are in the
TIME_WAITstate via the TCP timestamp option.
- modify the available port range in the
net.ipv4.ip_local_port_rangeoption to increase the maximum number of TCP connections that can co-exist.
Note that another common TCP configuration item,
net.ipv4.tcp_tw_recycle, has been removed in Linux 4.12, so we can no longer This configuration solves the problems caused by the
TIME_WAIT state of TCP plays a very important role as it is an indispensable part of the TCP protocol reliability design, and if it can be solved by adding machines, then we need to understand the design rationale behind it and avoid modifying the default configuration as much as possible, as the Linux manual says, when modifying these configurations Here, let’s revisit the reason for the
TIME_WAIT state in the TCP protocol, which causes the following problems when re-establishing a connection to a remote using the same port number if the client is not waiting long enough.
- Because the network transmission time of a data segment is uncertain, it may receive a data segment that was not received on the last TCP connection.
- Because the
ACKsent by the client may not have been received by the server, the server may still be in the
LAST_ACKstate, so it will reply with a
RSTmessage to terminate the establishment of a new connection.
TIME_WAIT state is the result of TCP’s struggle with uncertain network latency, and uncertainty is the biggest impediment to the TCP protocol on the road to reliability. To conclude, let’s look at some more open-ended related issues, and the interested reader can ponder the following questions.
- How does the
net.ipv4.tcp_tw_reuseconfiguration guarantee the relative security of reused TCP connections via timestamps?
- Why was the
net.ipv4.tcp_tw_recycleconfiguration removed from the protocol stack by Linux?