At this year’s developer conference, Apple announced that Safari will support the WebPush standard. The first support will be for the mac platform, which will be released this fall. Then it will be available for iOS in the first half of next year. By then, all major browsers will support WebPush features. This is a milestone for the Web! The Internet market is now desperately trying to promote mobile applications. One of the main reasons is that the retention rate of Web platform is very low, and the main reason for the low retention rate is the lack of push support. I hope the popularity of WebPush technology can promote the prosperity of the Web ecosystem. In order to do this, I’ve put together a paper on how WebPush technology works today and share it with you. I’d like to do my part.
Before we get into the details, we need to understand the entire technical architecture of WebPush.
The UA here is the User Agent, that is, the browser; the Application Server is the application server, which can be simply understood as the server that actively sends push messages; the Push Service is the push service, whose core function is to maintain a long connection with the browser.
The push workflow is as follows.
- The web page requests push permission from the browser. At this point, the browser will show the corresponding interface to the user.
- After the user agrees, the browser generates a set of subscription information and associates this subscription information with the application server that requested the push and sends it to the push service.
- The web application sends the subscription information along with other information about the user to the application server for storage.
- When a message needs to be pushed to the user, the application server constructs the data according to the specification and sends it to the push service.
- The push service receives the message, does the necessary authentication, and then sends the message to the browser.
- The browser receives the push and displays an alert message, and the user clicks on the push message to open the specified page or perform other operations.
The whole process is very similar to mobile app push, but WebPush is different from mobile push in many ways.
The first difference is that WebPush does not require a registered developer account. Any website can send messages as long as the user agrees. The mobile app push must be registered with each vendor’s developer account to work.
But will this design be abused? Definitely. That’s why WebPush uses a VAPID protocol to tag senders. I’ll explain more about this later. But VAPID is only used for tagging and does not require registration.
The second difference is that WebPush is very focused on protecting the privacy of its users, and WebPush uses encryption to ensure that the push service cannot see what is being pushed. That is, the messages pushed by the application server are encrypted and can only be decrypted by the browser. The push service only acts as a relay, it cannot see the actual content of the push.
It is because of these two features that WebPush is a complex technology. Let’s discuss in detail how WebPush works.
Let’s start with the VAPID protocol. The full name of VAPID is Voluntary Application Server Identification, and the full specification is defined in RFC8292. VAPID is the public key of the elliptic curve cryptographic key pair that is generated on the server side if the network needs to request push permission. The elliptic curve is P-256, which is used by WebPush for all asymmetric encryption.
With the VAPID, you can request push permission from the browser. However, all push-related functions need to be performed in ServiceWorker, so the code is a bit more complicated.
userVisibleOnly means that all pushes must show the notification screen. That means it has to be visible to the user and cannot be executed secretly in the background. All browsers currently require this field to be passed
applicationServerKey is the server-generated VAPID, which is the elliptic curve public key.
If the user agrees to receive push messages, the browser will send the corresponding VAPID to the push server.
When the server pushes a message to the user, it calls the HTTP interface of the push service with the
vapid is a fixed prefix. The
t is followed by a JWT token. Its signature algorithm is ES256, which means that the digest value is calculated using SHA-256 and the signature is made using a P-256 elliptic curve. there are three mandatory fields for JWT as follows.
audthe domain name of the push service, e.g.
subcontact information of the pushing party, either mailto: email address, or https link
Because it uses P-256 elliptic curve signatures, it also needs to be accompanied by the signed public key. It is the part that corresponds to the
k field. Here the public key is converted to X9.62 format and then base64 transcoded to obtain it.
Since the push service saves the VAPID of the application server in advance, it can determine whether it is a legitimate caller based on the
k field. On the other hand, the JWT of the
k field provides information such as expiration time and contact information. If the push service thinks the application server is behaving abnormally, it can notify the pushing party via the contact information of the JWT. This completes the identification of the pushing party.
However, we should see that this is only a limited identification. The push service can only disable a VAPID, not the actual service. However, once a VAPID is disabled, all the push messages associated with it will be disabled. This is enough to deter pushers from doing evil.
The above is the server identification part. Next we talk about the encryption part.
We got the
subscription object in the previous code, and after converting it to JSON, the structure is as follows.
endpoint indicates the link to be pushed. The browser generates a different push link for each user of each website. The application server needs to save this link with the user’s correspondence for use in the push.
expirationTime indicates the expiration time of the push message, which means that the user can allow the server to send the push for a period of time. The target all browsers do not support this feature.
keys is a browser-generated random sequence of 16 bytes in length. It is stored in base64 encoding. It is a browser-generated authentication password for the server, and is used with
keys is another public key pair of P-256 keys. It is used to exchange message encryption keys with the application server.
The web application needs to send the subscription information to the server for storage after obtaining it.
The server-side encryption process is as follows.
- generate a random 16-bit salt value
- temporarily generate a set of elliptic curve keys (as_private, as_public)
- negotiate the public key with the browser’s public key using its own private key ecdh_secret = ECDH(as_private, ua_public)
- use the HMAC-based key derivation function (HKDF) to compute the actual key for encryption.
WebPush uses the HMAC-SHA-256 algorithm.
First calculate the Input-keying material (IKM) key.
|| indicates that the contents of both sides are joined into a whole, same below.
The content encryption key (CEK) is then calculated as follows.
Finally, calculate the Nonce key.
Please refer to RFC8291 for the detailed procedure of key generation.
With CEK and Nonce we can encrypt the message content.
HTTPS can only guarantee that the communication from the application server to the push server is not eavesdropped. When the push server receives the data, it decrypts it and can read all the information. Obviously HTTPS alone does not enable the encryption capabilities of WebPush. For this reason, WebPush uses the encrypted transport encoding defined in RFC8188.
The HTTP protocol defines a variety of
Content-Encoding, the most common being gzip, which means that the transmitted content has been compressed.
rfc8188 defines a new type called
aes128gcm, which means that the transmitted content has been encrypted using
Data of type
aes128gcm has specific headers.
- salt is a sixteen-byte value for the salt used for encryption, generated by the application server
- rs full record size, the length of AES segment encryption
- idlen indicates the length of the keyid that follows, up to 255 bytes
- keyid denotes the key identifier for aes128gcm encryption
After the header information, the encrypted data.
aes128gcm encryption requires specifying the group length, each group is numbered from zero, and a different Nonce needs to be calculated, but WebPush messages are shorter, not more than 4096 bytes. However, WebPush messages are shorter than 4096 bytes, so only one group is needed, and the group number is not considered when calculating the first Nonce.
Also, AES is symmetric encryption, so obviously the key cannot be stored in the keyid field. What is really saved here is the public key as_public of the temporary elliptic key pair generated by the application server.
The server finally encrypts the push message with CEK and Nonce using the AEAD_AES_128_GCM algorithm and finally sends the following HTTP request to the push service.
The entire process of server-side subscription is encapsulated in a tool library that you can use directly. there is a WebPush libraries organization on GitHub that provides SDKs in multiple languages.
Take nodejs for example, and use it as follows.
The browser receives the push message and generates a public key with its own private key and keyid.
The key negotiation is done without exposing the private key of each party. Once the public key is obtained, the browser can repeat the application server computation process to obtain CEK and Nonce, and finally decrypt it using the aes128gcm algorithm.
For those readers who are interested, do you have any format requirements for the data pushed from the server to the browser? The answer is no. Because the browser will not process the decrypted push message itself, but will trigger the
push event of the serviceWorker. So websites that want users to receive pushes also need to register their own serviceWorker to handle the push messages.
The browser receives the push and calls the
receivePushNotification function. We need to resolve the push data in this function, extract the required fields, and then call the
showNotification function to display the push message. Note that you must use
event.waitUntil here to wait for
showNotification to return, otherwise there will be some messy problems. When the user shows the message notification the browser will fire the
notificationclick event again and then execute the
showNotification function has many parameters, you can refer to MDN. Different parameters are supported differently in different browsers and platforms, so you can choose them as you need.
The above is the main content of this article. This article basically covers the main content of WebPush and the related RFC standards. It should be useful for beginners to understand how WebPush works. However, due to the limitation of space, it is not possible to make detailed comments on the various encryption algorithms mentioned in this article, which is a pity. I will consider to write some special articles to introduce them later.