Analysis of efficiency issues of TCP-based VPNs

I have recently been working on the implementation of the AnyConnect protocol and service, an enterprise virtual private network (VPN) system developed by Cisco, and one of its core selling points is the UDP-based DTLS channel. In this article, we’ll analyze why UDP is used to transport VPN data.

VPNs work at the network layer, and all that is transmitted over the VPN tunnel is IP packets, which are unreliable. To achieve reliable transmission, we need to use TCP protocol. So how does TCP ensure reliable transmission? The answer is to number each packet, and the receiver receives the packet and passes this number back, also known as an ACK, in the following process.

If A does not receive an ACK within a certain period of time, it may be a packet loss, so A will resend the packet just now

Of course, if B replies with a dropped ACK, A will also retransmit

If packet loss occurs repeatedly, a large number of retransmissions are generated, which is a serious waste of network resources. That is why TCP introduces the exponential avoidance algorithm. The timeout interval is relatively short at first (e.g., 0.5 seconds). If no acknowledgement is received after the retransmission, the interval is waited for 1, 1.5, 2, 2.5, etc. in turn. It will not wait indefinitely, but will close the connection after a certain time and notify the application of the error.

So why is it bad to use TCP to transfer VPN data? Because VPNs work at the network layer, and the network layer itself does not require reliable transmission. Reliable transport is done at the transport layer above the VPN. If you use TCP to transfer VPN data, then this “network layer” is reliable, and the upper TCP layer’s reliable transport mechanism becomes a burden.

If you don’t look at the AnyConnect layer, there is only the upper layer TCP. the application receives an ACK packet for every packet it sends. But because AnyConnect takes the TCP channel. So, the packets from the upper layer are handed off to the lower TCP connection to be sent. the VPN counterpart receives the data and sends an ACK, and the real receiver receives the data and sends another ACK. the whole process is as follows.

Obviously, there are two ack vpn’s during the actual transmission. generally this does not cause any problems, but the problem comes if there is a timeout.

If there is a timeout, it must be the lower TCP that times out first, so the lower TCP retransmits and waits for an acknowledgement. The upper TCP does not know that the lower TCP has timed out and continues to send anyway, and as a result, none of the newly sent packets receive an acknowledgement, so the upper TCP starts retransmitting as well. Because the lower TCP is already in the retransmission wait state and does not send the retransmission data from the upper TCP, the upper TCP continues to time out and retransmit, thus causing an avalanche.

The key to the problem here is that TCP is designed with the underlying network layer as unreliable, but we implement a reliable network layer with TCP, thus creating such an avalanche.

Therefore, if you want to do VPN, you should try to use UDP, which is a simple addition of port information to IP, essentially no different from IP packets, and can be seen as a natural simulation of the network layer. With DTLS, it is a tailor-made protocol for VPN.

Table of Contents