TCP-based application layer protocol design

Many people, including myself, have long believed that TCP-based application layer protocols are simple and only require a packet header. Because TCP is a reliable protocol, it ensures that data reaches the other side in an orderly and error-free manner; it is just stream-oriented and does not preserve message boundaries, so we only need to define protocol packet headers that can distinguish individual datagrams. However, this is wrong: the transport layer protocol is limited in what it can do, and the application layer protocol can do much more than encapsulate a packet header. Let’s look at an example.

TCP connections are not that reliable

Let’s write a simple client in C. It connects to the server using TCP, sleeps for a while, and then calls send(2) to send a piece of data.

int main() {
    int fd = socket(PF_INET, SOCK_STREAM, 0);

    struct sockaddr_in addr = {0};
    addr.sin_family = AF_INET;
    addr.sin_port = htons(8000);
    inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

    if (connect(fd, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
        printf("connect error: %d\n", errno);
        return 1;
    }

    printf("connected\n");

    sleep(5); // pull the network cable

    printf("try send\n");

    const char *data = "Hello";
    if (send(fd, data, strlen(data), 0) < 0) {
        printf("send error: %d\n", errno);
        return 1;
    }
    printf("sent\n");

    return 0;
}

Now we run the client. After it connects to the server, unplug the cable while it is asleep. As a result, it will send normally, and then exit normally. The server doesn’t get anything.

$ ./cli
connected
try send
sent

Why does this happen? As we know, in TCP, after the data is sent to the other end, the corresponding ACK must be received to ensure that the data reaches the other end.

However, when send(2) returns, there is no guarantee that the data sent was acknowledged, or even that it was sent. All it does is put the data into the kernel buffer, and as long as this operation succeeds, it returns success. When send completes, it does not close the connection normally, but simply exits. Since the program does not close the connection, the OS will maintain the connection after the program exits. After several unsuccessful retransmissions, the connection is dropped and the program is not aware of the exception.

This is a serious problem, especially in mobile networks. The wireless network of mobile devices is not stable and “unplugging” happens a lot. It can happen before the send call; it can happen after the send call, but before the data is sent; it can happen when the data is delivered and waiting for the ACK to be received. How to solve this problem?

Close the connection normally

One of the problems with the above program is that the connection is not closed before the end of the program. Recall that closing a TCP connection sends a FIN to the other end, telling the other end: “All segments less than the FIN sequence number have been sent, if you have confirmed receipt of all segments including the FIN, please send me an ACK”. So if the connection can be closed properly, all data is guaranteed to reach the other end.

Now modify our code to call close(2) before the program exits, and see how that works.

int main() {
    ...

    if (close(fd) < 0) {
        printf("close error: %d\n", errno);
        return 1;
    }
    printf("closed\n");

    return 0;
}

We run the program again, again unplugging the network cable while it is asleep. However, the data is successfully sent, the connection is successfully closed, and the program exits normally.

$ ./cli
connected
try send
sent
closed

The reason is that close(2), like send(2), doesn’t wait for the ACK, it just puts the FIN into the send queue. The problem is that even if we can confirm that the connection closes properly (which it does), it doesn’t help, because for long connections it is impossible to ensure that the data is delivered by closing the connection.

Get the send queue size

So is there a way to determine if the ACK corresponding to the data has been received? Actually, there is. Linux supports SIOCOUTQ requests, which can get the size of the send queue. This is also the size of the data that TCP has not yet acknowledged as delivered. We sleep 2 seconds after sending data, then call ioctl(2) to pass in SIOCOUTQ to get the send queue size.

...

sleep(2);

int qsize;
ioctl(fd, SIOCOUTQ, &qsize);
printf("queue size: %d\n", qsize);

Perform the same operation again:

$ ./cli
connected
try send
sent
queue size: 5
closed

As you can see, even after waiting 2 seconds, there are still 5 bytes of data that have not been confirmed as delivered.

This approach can detect whether the data is delivered or not, but we cannot use this approach. It has several problems:

First, not all systems support getting the send queue size. SIOCOUTQ’ is only supported by Linux.
Second, just because an ACK is not received does not mean that the data is not delivered. It is possible that the data is delivered, but the network is disconnected when the ACK is received. These unacknowledged data should have been retransmitted by TCP, and the other side would have ignored them when they received the retransmitted data because of the duplicate sequence numbers. However, now that the network is down and TCP can’t retransmit them, can we manually retransmit them in a new connection that is established later? What if these data indicate the submission of an order? This results in duplicate requests being sent.
Finally, this approach violates the protocol hierarchy principle. TCP’s automatic retransmission should be transparent to the upper layer protocols, and the application layer should see TCP as a flow-oriented duplex channel and should not care whether TCP receives an ACK. Moreover, application-layer datagrams are conceptually different from TCP’s send queue: unacknowledged data in the queue may belong to several different datagrams.

Therefore, this problem needs to be solved in the application layer protocol.

Solution for HTTP

Let’s look at how mature application layer protocols do it. HTTP is very simple in that it requires a request to be accompanied by a response, and does not allow the server to actively push data.

Each HTTP request expects a response, and if a request is delayed, the request is considered to have failed. In this way, if the network is disconnected in the middle of sending, the client can sense the exception and retry. However, this poses another problem, the client does not know if the server received the request, because the network may be disconnected while the server is answering. This can be a problem if the request involves state. For example, if the request is to submit an order, retries may result in multiple orders being submitted.

HTTP introduces the concept of methods for this purpose. For idempotent requests, that is, requests that do not change the state of the server, the GET method is used. There is no problem retrying such requests. For requests that change the state of the server, the POST method is used. For example, if the order submission fails, the browser will give a warning “Confirm resubmitting form” when retrying. You should then refresh the order list, confirm that the order was not submitted successfully, and retry again.

HTTP does not allow the server to actively push data. Because the client does not (and should not) normally respond to server pushes, if the server pushes data randomly, the network may be disconnected during the push, resulting in data loss that is not perceived by either party. Even if the client senses that the network is down when it requests it later, neither side knows what data is missing.

WebSocket solutions

WebSocket allows both parties to send data to each other freely, and does not require a response to the request. But it has a heartbeat mechanism. WebSocket is based on TCP, and once the network is disconnected, the next heartbeat timeout must occur to detect the disconnection. In addition, WebSocket manages the closing of connections, and to close a WebSocket connection, a Close message is sent to the other side, and the other side replies with a Close message. This way the application layer knows if the connection is closed properly.

Application layer protocol design

Considering possible network disconnections, it is a mistake to send a piece of data directly to the other side and assert that it will be delivered, even with the TCP protocol. Application layer protocols need to have a bounce-back mechanism. We can require a response to a request, as with HTTP, to detect anomalies in the most timely manner, or we can use heartbeats, which can be used in less demanding scenarios.

If a network failure is detected, we may need to disconnect and try to re-establish the connection, then resend the request. However, this is not possible for all requests. For non-idempotent requests, i.e., requests that change the state of the server, this may lead to unexpected results due to repeated requests. Such requests often rely on the state of both ends, and once the connection is abnormally closed, the state of both ends may be inconsistent and should be resynchronized upon reconnection.

For example, suppose the client wants to request the server to submit an order. If there is a response timeout, then the client should request the order list again after reconnection. If the order is already in the order list, we should not reorder it, i.e. the “submit order” request depends on the “order list” state. Further, we can assign a unique ID to each order, so that the client can safely retransmit the order after a successful reconnection, and the server will ignore duplicate orders.

As another example, suppose the server wants to broadcast a message to all clients. Generally speaking, clients do not reply to messages pushed by the server, and replying to broadcast messages will result in a concentration of requests that will burden the server. In this case, how to ensure that all clients receive this message? We can use the following approach:

The server broadcasts the messages and caches them for a period of time that depends on the validity of the message. This is the “state” of the broadcast message.
The client has a heartbeat mechanism. If there is a network disconnection while receiving a broadcast message, it can be detected later.
When the client reconnects successfully, it requests all cached broadcast messages. This allows the client to get all the messages.

This approach is state-oriented communication, or communication by synchronizing state. It takes the message or the result of the message as state. Another example is in a game server, which wants to add 50 gold coins to a player. It is not a good idea to push the message “add 50 coins” directly to the client, the correct way is for the server to increase the number of coins it has recorded for the player by 50, and then push the current number of coins to the client. If there is a network disconnection, the client will get the current coin count again after reconnecting. This is also an example of state-oriented communication, this time using the result of the message as the state.

Summary

Network protocols do not only define the structure of the data exchanged, they also define how the data is exchanged, the transmission steps, and a host of other things. Since the application layer does not (and should not) have access to the detailed state of the transport layer, the application layer protocol needs to do something to ensure the reliability and integrity of the data. In order to prevent accidental network disconnections, the application layer needs a survivability mechanism, which may rely on responses or heartbeats. When the network is disconnected and reconnected successfully, the request should not be retransmitted easily and the state should be synchronized first. For unanswered requests, it is a good idea to use state-oriented communication.

Table of Contents