Manually configure IPsec VPN using ip xfrm

WireGuard is a VPN module implemented in the Linux kernel. Thanks to the kernel implementation, it eliminates the need to copy data between the kernel and user states compared to OpenVPN, which is based on tun devices, so performance is naturally much better. However, I know from the WireGuard technical white paper that WireGuard’s performance is only slightly better than IPsec. This made me wonder if the core functionality of IPsec is also implemented in the kernel state. After some research, it was true: the Linux kernel implements the most basic IP data encryption and decryption functions through the xfrm module, while user authentication and key negotiation are left to the user-state IKE service. If there is no IKE service, we can also configure IPsec VPN by manually executing the ip xfrm command.

This article assumes that the nodes on both sides of the VPN have public IP addresses. If NAT is used on one side, there is a NAT pass-through protocol involving IPsec. Due to the length of this article, I will write a separate article on NAT related issues.

Before the actual operation, I will give you a little bit of theoretical knowledge.

IPsec consists of a series of RFC documents, the core of which is RFC4303, which defines the ESP protocol, known as IP Encapsulating Security Payload. ESP is a protocol on the same level as TCP/UDP, and its protocol number is 50. ESP adds encryption and authentication-related fields after the IP header information, with the following structure.

+------------------------------+
|           IP Header          |
+==============================+
|Security Parameters Index(SPI)|
+------------------------------+
|       Sequence Number        |
+------------------------------+
|      Encrypted Payload       |
+------------------------------+
| Padding | pad len | next hdr |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|      Authentication Data     |
+------------------------------+

SPI is a thirty-two-bit integer, which can be specified arbitrarily. The same SPI uses the same encryption algorithm and key for the same data.

Sequence Number indicates the self-incrementing number of encrypted data within the same SPI, which is used to resist replay attacks. IPsec is divided into transport mode and tunnel mode. Transmission mode encrypts only the TCP/UDP data portion of the original plaintext IP packet, while tunneling mode encrypts the entire plaintext IP message.

Taking IPv4 messages as an example, the difference between transport mode and tunnel mode is as follows.

BEFORE APPLYING ESP
----------------------------
|orig IP hdr  |     |      |
|(any options)| TCP | Data |
----------------------------

AFTER APPLYING ESP of TRANSPORT MODE
-------------------------------------------------
|orig IP hdr  | ESP |     |      |   ESP   | ESP|
|(any options)| Hdr | TCP | Data | Trailer | ICV|
-------------------------------------------------
                    |<---- encryption ---->|
              |<-------- integrity ------->|

AFTER APPLYING ESP of TUNNEL MODE
---------------------===============-----------------------
| new IP hdr* |     | orig IP hdr*  |   |    | ESP   | ESP|
|(any options)| ESP | (any options) |TCP|Data|Trailer| ICV|
---------------------===============-----------------------
                    |<--------- encryption --------->|
              |<------------- integrity ------------>|

In transmission mode, the receiving end can only get TCP/UDP etc. after receiving the decrypted data and cannot forward it to other hosts, so this mode can only be used for point-to-point encrypted communication. In contrast, tunnel mode encrypts the whole IP message completely, and the receiving end can continue to transmit after decrypting. So tunnel mode can be used to connect different networks. Most VPN networks in general use tunnel mode. However, you can also use technologies such as GRE to establish a tunnel first, and then use IPsec to encrypt the GRE communication, then you can use transport mode. This article is only about tunnel mode.

A set of IPsec configuration to solve two problems.

how to encrypt and decrypt data
what data to encrypt or decrypt

Both of these issues are implemented by the xfrm module in the Linux kernel, and we can use ip xfrm to manage the relevant configuration. xfrm in this case is pronounced transform.

ip xfrm has two subcommands, state and policy, which correspond to the two previous questions.

Assume the following network topology.

1
2
3

     10.0.1.1      1.1.1.1    2.2.2.2      10.0.2.1
>-------------|R1|--------------------|R2|------------<
  10.0.1.0/24                              10.0.2.0/24

We want the 10.0.1.0/24 segment to be accessible on the 10.0.2.0/24 segment. We first need to set the encryption algorithm. Execute the following command on R1 and R2.

ip xfrm state add src 1.1.1.1 dst 2.2.2.2 \
                  proto esp spi $ID \
        reqid $ID \
        mode tunnel \
        aead 'rfc4106(gcm(aes))' $KEY 128
ip xfrm state add src 2.2.2.2 dst 1.1.1.1 \
                  proto esp spi $ID \
        reqid $ID \
        mode tunnel \
        aead 'rfc4106(gcm(aes))' $KEY 128

The following is an analysis of the role of each parameter, using R1 as an example.

src 1.1.1.1 dst 2.2.2.2 indicates the direction of data transmission
proto esp spi $ID indicates the use of ESP protocol, $ID can be specified as needed or randomly generated, usually randomly generated
mode tunnel specifies the tunnel mode to use
aead 'rfc4106(gcm(aes))' $KEY 128 indicates the use of aes-gcm encryption

Many sources on the web use the auth sha256 $KEY1 enc aes $KEY2 parameter. Here encryption use aes cbc, key is $KEY2, data integrity authentication use hmac-sha256 algorithm, key is $KEY1. this should be considered an obsolete usage. If the device allows, it is recommended to use rfc4106(gcm(aes)), which is an algorithm that supports both encryption and integrity checking. In addition, the aes algorithm in gcm mode also supports parallel encryption, the performance is better than aes cbc mode, and the configuration is simpler, only one key needs to be generated, but RFC4106 specifies that the length of key can only be 20/28/36, the last four bytes indicate the encryption salt value, and the front part indicates the AES key.

The final 128 indicates the Integrity Check Value (ICV) length of the aes-gcm. rfc4106 requires a minimum of 128 bits, with optional support for 62 bits and 96 bits.

Two ip xfrm states need to be added because both transmit and receive messages are processed at the same time.

The difference is that src is not the same as dst. Since the encryption algorithm is used, the same command should be executed on the R2 device.

The example I gave uses the same SPI and encryption key for sending and receiving, which is purely for convenience. We can also set separate encryption keys for packets in different directions. But if we want to set them separately, then the send key for R1 has to correspond to the receive key for R2, and vice versa.

The above command also has a reqid $ID parameter, which is used in association with the ip xfrm policy to address the question of which packets to encrypt and decrypt. Now we analyze the ip xfrm policy command.

Unlike the ip xfrm state, R1 and R2 need to execute separate ip xfrm policy commands. Let’s take R1 as an example.

ip xfrm policy add src 10.0.1.0/24 dst 10.0.2.0/24 dir out \
                   tmpl src 1.1.1.1 dst 2.2.2.2 \
            proto esp reqid $ID mode tunnel
ip xfrm policy add src 10.0.2.0/24 dst 10.0.1.0/24 dir fwd \
                   tmpl src 2.2.2.2 dst 1.1.1.1 \
            proto esp reqid $ID mode tunnel
ip xfrm policy add src 10.0.2.0/24 dst 10.0.1.0/24 dir in \
                   tmpl src 2.2.2.2 dst 1.1.1.1 \
            proto esp reqid $ID mode tunnel

Each add is followed by src and dst, which are used to match unencrypted IP packets. dir indicates direction, and there are three directions: out/fwd/in. tmpl is followed by processing rules or ip xfrm stat matching rules.

The first command means: for packets sent from 10.0.1.0/24 segment to 10.0.2.0/24 segment, perform ESP encryption on them, use tunnel mode, and the encryption parameters are looked up from the ip xfrm state by reqid $ID.

The second and third commands are similar in form to the first one, but have very different functions, mainly for filtering decrypted IP messages. We have added the ip xfrm state earlier, so all ESP messages from R2 to R1 will be decrypted by R1, if the decryption is successful, R1 will find the corresponding policy according to the reqid of stat, if the decrypted IP message matches the src/dst/dir of the policy, it will be received or forwarded to the corresponding device. Otherwise, the message will be discarded.

Careful readers will notice that there seems to be a missing fwd rule from 10.0.1.0/24 to 10.0.2.0/24. I was also wondering, but after checking the Internet, I found that packets in this direction will go out rule. Specific reference can be made here

I summarized that all outbound data will go to out rule, inbound data will go to fwd and in rule. If the destination address of the decrypted IP message is R1, the in rule will be hit. Therefore, at most, you only need to set out/fwd/in rules.

Finally, we need to execute a similar command on R2, but switch the order of src and dst.

ip xfrm policy add src 10.0.2.0/24 dst 10.0.1.0/24 dir out \
                   tmpl src 2.2.2.2 dst 1.1.1.1 \
           proto esp reqid $ID mode tunnel
ip xfrm policy add src 10.0.1.0/24 dst 10.0.2.0/24 dir fwd \
                   tmpl src 1.1.1.1 dst 2.2.2.2 \
           proto esp reqid $ID mode tunnel
ip xfrm policy add src 10.0.1.0/24 dst 10.0.2.0/24 dir in \
                   tmpl src 1.1.1.1 dst 2.2.2.2 \
           proto esp reqid $ID mode tunnel

Once the above configuration is done, the 10.0.1.0/24 segment can communicate with 10.0.2.0/24. However, IPsec is very different from OpenVPN or WireGuard, the whole VPN system is based on the kernel rules and no new virtual NIC devices are created. So by default, you cannot access each other’s private network segment on R1 and R2. For this reason, we can add a route to each device.

# R1
ip route add 10.0.2.0/24 dev eth0 src 10.0.1.1
# R2
ip route add 10.0.1.0/24 dev eth0 src 10.0.2.1

It is assumed that the public NICs of R1 and R2 are eth0. The most critical is the src parameter, which specifies the source IP address used to access the peer network from R1/R2. Only traffic between 10.0.1.0/24 and 10.0.2.0/24 will be processed by xfrm. Without this route, R1/R2 will try to send messages to the other private network using their own public IP address, which naturally will not hit the rules in xfrm.

Finally, I’d like to mention the NAT conversion problem of the router. I was experimenting with my home broadband router and a public VPS because I needed a public IP address, but I couldn’t connect. After capturing packets found that the home broadband router will rewrite the source address of IP messages, thus failing to hit the xfrm rule. The easiest way to deal with this is to add a source segment restriction to the firewall’s MASQUERADE rule.

If R1 and R2 can log in to each other via SSH, we can organize the above commands into a shell script.

#!/bin/sh
# manual-ipsec.sh

# Check parameters
if [ "$6" == "" ]; then
    echo "usage: $0 <local_ip> <remote_ip> <new_local_net> <new_local_ip> <new_remote_net> <new_remote_ip>"
    echo "creates an ipsec tunnel between two machines"
    exit 1
fi

SRC="$1"
DST="$2"
LOCAL="$3"
LOCAL_IP="$4"
REMOTE="$5"
REMOTE_IP="$6"

# Generate reqid and AES key
ID=0x`dd if=/dev/urandom count=4 bs=1 2> /dev/null| xxd -p -c 8
KEY=0x`dd if=/dev/urandom count=20 bs=1 2> /dev/null| xxd -p -c 40`

sudo ip xfrm state flush && sudo ip xfrm policy flush
sudo ip xfrm state add src $SRC dst $DST proto esp spi $ID reqid $ID mode tunnel aead 'rfc4106(gcm(aes))' $KEY 128
sudo ip xfrm state add src $DST dst $SRC proto esp spi $ID reqid $ID mode tunnel aead 'rfc4106(gcm(aes))' $KEY 128
sudo ip xfrm policy add src $LOCAL dst $REMOTE dir out tmpl src $SRC dst $DST proto esp reqid $ID mode tunnel
sudo ip xfrm policy add src $REMOTE dst $LOCAL dir in tmpl src $DST dst $SRC proto esp reqid $ID mode tunnel
sudo ip xfrm policy add src $REMOTE dst $LOCAL dir fwd tmpl src $DST dst $SRC proto esp reqid $ID mode tunnel
sudo ip route add $REMOTE dev eth0 src $LOCAL_IP

# Login to the peer machine and execute the relevant commands
ssh $DST /bin/bash << EOF
    sudo ip xfrm state flush && sudo ip xfrm policy flush
    sudo ip xfrm state add src $SRC dst $DST proto esp spi $ID reqid $ID mode tunnel aead 'rfc4106(gcm(aes))' $KEY 128
    sudo ip xfrm state add src $DST dst $SRC proto esp spi $ID reqid $ID mode tunnel aead 'rfc4106(gcm(aes))' $KEY 128
    sudo ip xfrm policy add src $REMOTE dst $LOCAL dir out tmpl src $DST dst $SRC proto esp reqid $ID mode tunnel
    sudo ip xfrm policy add src $LOCAL dst $REMOTE dir in tmpl src $SRC dst $DST proto esp reqid $ID mode tunnel
    #sudo ip route add $LOCAL dev eth0 src $REMOTE_IP
EOF

Then you can execute the following script on R1 to configure the IPsec VPN.

`1`	`./manual-ipsec.sh 1.1.1.1 2.2.2.2 10.0.1.0/24 10.0.1.1 10.0.2.0/24 10.0.2.1`

The above is the entire content of this article. Compared with similar content on the Internet, this article analyzes the role of each command and parameter in detail, and gives a method to build IPsec tunnels for dual public devices. Due to the limitation of space, key issues such as NAT pass-through and key rekey are not discussed in detail, I will write a special article later, so stay tuned.

Reference.