Birth of WebSocket

In 2005, with the birth and application of AJAX technology, it became possible to request data from a web page to a server and render the page dynamically. HTTP is a one-way transmission protocol that can only be sent by the client, and it is really difficult to obtain continuous real-time information. People started this fancy exploration of real-time access to information technology.

Relying on the HTTP protocol, the more widespread way is polling. The client sends requests continuously at regular intervals, and the server returns the latest data. The shorter the polling time, the more real-time the message is. The disadvantages of this approach are obvious. Since HTTP is stateless and every request is a full HTTP request, a lot of polling will keep carrying useless headers and wasting resources. In addition, if the server does not have too many messages, then most of the bombarding polling requests are actually useless and inefficient. In some scenarios requiring low latency, such as real-time matchmaking games, this timed request design is also no way to meet the requirements.

http polling

Another technique is long polling, which is an improvement over the previous short polling scheme. In the long polling technique, the client sends an HTTP request to the server, which receives and maintains this connection. It waits until the server has new data and then returns. Then the client initiates the next request and re-initiates the connection with the server.

http long polling

In addition, there are other, more hacked solutions, such as iframe, htmlfile, etc., which together with long polling can be called Comet.

All of these solutions are desperate measures. Because of the limitations of the HTTP mechanism, it is impossible for the server to initiate a connection to notify the client, so there are so many Hack schemes.

To solve these problems, the WebSocket protocol was born in 2008. It is a full-duplex communication protocol built on a TCP connection, which can continuously send and receive data with a single handshake. With WebSocket, two-way real-time communication between server and client is finally possible, and in 2011, WebSocket became an international standard and is widely supported.

websocket

Unlike the HTTP protocol, WebSocket solves many of these problems.

  • Passivity. HTTP can only be sent from the client to the server, while WebSocket can use its own established TCP long connection channel to enable the server to actively push messages to the client.
  • Stateless. HTTP does not save the client’s information after the request is completed, and when there is another HTTP request, the client’s information needs to be retransmitted to tell the server who it is. In contrast, during the whole process of WebSocket communication, the server always knows the client’s information, so it does not need to carry its own information again and only needs to pass the content proper, so it can greatly save communication traffic.

How to make a WebSocket connection

Establishing a connection

During the handshake phase of a WebSocket connection, the client sends an HTTP request to the server (this is the only connection between WebSocket and HTTP).

1
2
3
4
5
6
7
8
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
Origin: http://example.com

Server’s response.

1
2
3
4
5
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat

In the client’s send request, the core two fields are Connection: Upgrade and Upgrade: websocket. The Connection=Upgrade header indicates that the client wants to upgrade the connection, while the Upgrade=websocket header indicates that the target protocol for the upgrade is WebSocket.

In addition, Sec-WebSocket-Key and Sec-WebSocket-Accept are a layer of verification to verify whether the server really supports WebSocket, similar to the signature mechanism.

After the handshake is completed, the server upgrades the HTTP protocol to WebSocket protocol, and the two ends can send data to each other.

Data transfer

As mentioned earlier, WebSocket can be a great traffic saver because it does not carry each request over HTTP. the frame data of WebSocket is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
 0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-------+-+-------------+-------------------------------+
 |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
 |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
 |N|V|V|V|       |S|             |   (if payload len==126/127)   |
 | |1|2|3|       |K|             |                               |
 +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
 |     Extended payload length continued, if payload len == 127  |
 + - - - - - - - - - - - - - - - +-------------------------------+
 |                               |Masking-key, if MASK set to 1  |
 +-------------------------------+-------------------------------+
 | Masking-key (continued)       |          Payload Data         |
 +-------------------------------- - - - - - - - - - - - - - - - +
 :                     Payload Data continued ...                :
 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
 |                     Payload Data continued ...                |
 +---------------------------------------------------------------+
  • Opcode: type of packet (intermediate packet, text type, binary type, disconnect type, ping-pong type, etc.)
  • Payload Data: data content

In terms of data transfer, because it is a duplex communication, the events and methods of the client and server are basically the same, including OnOpen (connection success), OnError (connection failure), Send (send message), OnMessage (receive message), Close (close connection), OnClose (connection is closed), Ping, Pong, etc.

Closing a connection

Unlike establishing a connection, closing a connection does not require an HTTP request to be sent, only a packet of the closing type to be sent within the channel by the party actively closing. The reason for closing the connection can be carried further in Paylaod Data.

Closing a connection

Heartbeat mechanism

The heartbeat mechanism is not necessary, but it is very necessary. If the network is good enough, as long as the long connection channel is established, any time one end sends a message, the other end will definitely receive it; one end can also notify the other end through the long connection channel before it goes offline actively. However, sometimes, due to network instability and other reasons, the client or server suddenly drops the connection, but the other end is not notified. The connection state of TCP is not a physical state, but a logical state established by three handshakes. Therefore, if the other end does not receive notification of the drop, the logical connection remains open until the next time when the client initiates a message and finds out that the other end is no longer there.

Client disconnection

The heartbeat mechanism is a mechanism developed to detect physical disconnections early and avoid idle occupation. After a WebSocket connection is established, one end of the connection channel sends a Ping message to the other end at regular intervals, and the other end receives it and returns the content (also called Pong). If the initiating end receives a Pong message, the connection is OK, otherwise the connection may be faulty.

Heartbeat Mechanism

Socket.io

The APIs of the client-side and server-side WebSocket frameworks are mostly “primitive”, except for the common methods of sending and receiving messages, connecting to the server, etc., which you need to implement yourself. Socket.io is based on a WebSocket wrapper, and not just a wrapper. In order to solve possible compatibility issues, Socket.io supports automatic downgrading to polling and other schemes in case the WebSocket protocol is not supported. In addition, Socket.io provides out-of-the-box operations for broadcasting, rooms, namespaces, etc. It also provides a scheme for automatic heartbeat maintenance. Using Socket.io lowers the barrier to entry and use of WebSocket to a great extent.

A simple implementation

Server-side (Node)

In Node.js, you can quickly start a server using ws.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
const WebSocket = require('ws')
const wss = new WebSocket.Server({ port: 8080 })

wss.on('connection', ws => {
  ws.on('message', message => {
    console.log('received: %s', message)
    // Broadcast
    wss.clients.forEach(client => {
      if (client !== ws && client.readyState === WebSocket.OPEN) {
        client.send(message)
      }
    })
  })
  ws.on('close', () => {
    console.log('closed')
  })
  ws.send('Connect success.')
})

Client side (JavaScript)

A client application can be started directly in the browser using a WebSocket object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
const ws = new WebSocket('wss://echo.websocket.org')

ws.onopen = (event) => {
    console.log('Connection open.')
}

ws.onmessage = (event) => {
  console.log('Received Message: ' + event.data)
}

ws.onclose = function (event) {
  console.log('Connection closed.')
}

Client (iOS)

There are equally many options for clients in the iOS client, such as SocketRocket.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
@interface YourObject () <SRWebSocketDelegate>
@property (strong, nonatomic) SRWebSocket *socket;
@end
  
@implementation YourObject

- (void)initSocket {
    NSURL *url = [NSURL URLWithString:@"wss://echo.websocket.org"];
    SRWebSocket *socket = [[SRWebSocket alloc] initWithURL:url];
    socket.delegate = self;
    self.socket = socket;
    
    [self.socket open];
}

- (void)webSocket:(SRWebSocket *)webSocket didReceiveMessage:(id)message {
    NSLog(@"didReceiveMessage: %@", message);
}

- (void)webSocket:(SRWebSocket *)webSocket didFailWithError:(NSError *)error {
    NSLog(@"didFailWithError: %@", error);
}

- (void)webSocket:(SRWebSocket *)webSocket didCloseWithCode:(NSInteger)code reason:(NSString *)reason wasClean:(BOOL)wasClean {
    NSLog(@"didCloseWithCode: %@", @(code));
}

- (void)webSocketDidOpen:(SRWebSocket *)webSocket {
    NSLog(@"webSocketDidOpen: %@", webSocket);
  
    [self.socket send:@"Hello WebSocket"];
}