The Linux networking stack is not lacking in features, and it performs well enough for most purposes. However, with high-speed networks, the extra overhead of traditional network programming is too large a percentage. In the previous article on syscall.Socket, we introduced the AF_PACKET type socket, which has a really mediocre performance, all the data has to be converted between user and kernel state, and there are a lot of interrupts in case of high concurrency. Using eBPF XDP can be a perfect solution for high performance, we introduced XDP technology in an earlier article, Björn Töpel added a protocol family
AF_XDP for sockets in Linux 4.18, which allows high performance network reading and writing using the socket interface and XDP technology.
In 2019, Intel’s Björn Töpel (who is the main implementer of the AF_XDP Socket) presented a performance comparison between AF_XDP and normal AF_PACKET in three scenarios in a share.
You can see that the performance of AF_XDP is much greater than that of AF_PACKET.
Introduction to AF_XDP Socket
AF_XDP (eXpress Data Path) is a high-performance network protocol stack that enables zero-copy data transfer and zero-interruption data reception. af_xdp socket is a socket type in the Linux kernel that supports the AF_XDP protocol.
Compared with traditional sockets, AF_XDP socket has the following significant features:
- Zero-copy transfer: When transferring data using AF_XDP socket, data can be transferred directly in memory without copying data from user space to kernel space, thus reducing the number of memory copies during data transfer and improving data transfer efficiency.
- Zero-interrupt reception: When receiving data using AF_XDP socket, data can be received directly from the NIC without notifying the kernel through interrupts, thus reducing the number of interrupt processing and improving the efficiency of receiving data.
- Support for multiple queues: AF_XDP socket supports multiple queues, which can route different network traffic to different queues, thus achieving better load balancing and multi-core utilization.
- Support for user space protocol stack: AF_XDP socket can be used in combination with protocol stack in user space, thus allowing the implementation of network protocol stack in user space, which improves the performance and flexibility of network applications.
In summary, AF_XDP socket is a high-performance network data transfer method for high-performance network applications that need to handle large amounts of data.
We use the normal
socket() system call to create an AF_XDP socket (XSK). Each XSK has two rings:
RX RING and
TX RING. The socket can receive packets on the RX RING and can send packets on the TX RING ring. These rings are registered and resized via XDP_RX_RING and XDP_TX_RING of setockopts(), respectively. Each socket must have at least one of these rings. the RX or TX descriptor ring points to a data buffer in the storage area (called UMEM). the RX and TX can share the same UMEM, so there is no need to copy packets between RX and TX.
UMEM also has two rings:
FILL RING and
COMPLETION RING. The application uses
FILL RING to send an
addr to the kernel that can carry the packet (the addr refers to a chunk in the UMEM) for the kernel to populate with RX packet data. Whenever a packet is received, references to these chunks appear in the RX ring. On the other hand,
COMPLETION RING contains the addresses of chunks that have been fully transmitted by the kernel and can be used again by user space for either TX or RX.
As you can see, there are four rings, the data in the
RX RING and
TX RING rings are descriptors (xdp_desc), while the
FILL RING and
COMPLETION RING are addresses (u64).
- Rx Ring: The Receive Ring is generated by the hardware NIC or AF_XDP driver and stores the Receive Descriptor of the received data frame to be processed and passes these descriptors to the kernel or user space program. The receive ring usually consists of multiple queues, each with a separate Rx Ring. the producer of the Rx Ring is the XDP program and the consumer is the user state program. the XDP program consumes the Fill Ring, obtains the desc that can carry the message and copies the message to the address specified in the desc. It then fills the Rx Ring with the desc and notifies the user-state program to receive the message from the Rx Ring through the socket IO mechanism.
- Fill Ring: A Fill Ring is a ring in which a user space program generates new descriptors for a receive ring so that the receive ring always has enough descriptors available. The Fill Ring can also consist of multiple queues, each with a separate Fill Ring. the producer of the Fill Ring is the user state program and the consumer is the XDP program in the kernel state. The user-state program passes the UMEM frames that can be used to carry messages through the Fill Ring to the kernel, which then consumes the descriptor desc in the Fill Ring and copies the messages to the address specified in desc (which is the address of the UMEM frame).
- Tx Ring: The Transmit Ring is generated by the user space program and is used to store the Descriptor of the data frame to be sent. The transmit ring can also consist of multiple queues, each with a separate Tx Ring. the producer of the Tx Ring is the user state program and the consumer is the XDP program. The user-state program copies the message to be sent to the address specified by desc in the Tx Ring, then the XDP program consumes the desc in the Tx Ring, sends the message, and tells the user-state program the desc of the successfully sent message via the Completion Ring;
- Completion Ring: A Completion Ring is a ring used to receive the descriptors of data frames that have already been processed. The completion ring is created by a kernel or user space program and can consist of multiple queues, each with a separate Completion Ring. the producer of the Completion Ring is the XDP program and the consumer is the user state program.
When the kernel finishes sending XDP messages, it notifies the user-state program via completion_ring which messages have been successfully sent, and then the user-state program consumes the completion_ring in desc (just updating the consumer count is equivalent to an acknowledgement);
With these four rings working together, AF_XDP enables high-performance network data transfer as well as the implementation of a network protocol stack in user space. The user space program can generate new incoming data descriptors for the Rx Ring via the Fill Ring and then send the processed data out using the Tx Ring. The kernel or user space program can fetch the processed descriptors from the Completion Ring for subsequent processing. These rings enable efficient data processing and network load balancing, thus improving the performance and throughput of web applications.
AF_XDP Socket is used in high performance network application scenarios, including DDoS attack defense, network traffic monitoring, load balancing, etc. In these application scenarios, AF_XDP can improve the performance and security of network applications by processing large amounts of network traffic data in real time, quickly identifying malicious traffic and load balancing.
Go AF_XDP Practice
The AF_XDP socket is at least an order of magnitude more complex than the traditional AF_PACKET, and because of its complexity, it is error-prone, but fortunately, there is a third-party library that encapsulates it and makes it easier for us to use. This library is asavie/xdp.
It encapsulates XSK and provides some very convenient methods for reading and sending data.
- type Desc
- type Program
- type Socket
- func NewSocket(Ifindex int, QueueID int, options *SocketOptions) (xsk *Socket, err error)
- func (xsk *Socket) Close() error
- func (xsk *Socket) Complete(n int)
- func (xsk *Socket) FD() int
- func (xsk *Socket) Fill(descs Desc) int
- func (xsk *Socket) GetDescs(n int) Desc
- func (xsk *Socket) GetFrame(d Desc) byte
- func (xsk *Socket) NumCompleted() int
- func (xsk *Socket) NumFilled() int
- func (xsk *Socket) NumFreeFillSlots() int
- func (xsk *Socket) NumFreeTxSlots() int
- func (xsk *Socket) NumReceived() int
- func (xsk *Socket) NumTransmitted() int
- func (xsk *Socket) Poll(timeout int) (numReceived int, numCompleted int, err error)
- func (xsk *Socket) Receive(num int) Desc
- func (xsk *Socket) Stats() (Stats, error)
- func (xsk *Socket) Transmit(descs Desc) (numSubmitted int)
- type SocketOptions
- type Stats
We introduce its functions with two examples of it.
Example of sending
The following is an example of a DNS query that is constantly being sent.
- First it generates an XSK based on the NIC, the initialization of this XSK hides a lot of the underlying initialization actions, which is a very good place for this library to do
- Generate a specific DNS query request packet, which will be used later to send data to the network
- Get all available Desc, and initialize with DNS request data
- Start a goroutine that prints out the number of packets sent and the size of the data every second to see how it performs
- In an infinite loop, first get the Desc that can be sent, then call
Transmitto write the Desc to the Tx ring.
- Then call
Poll, wait for the kernel to send data or receive data, and then do the next data sending
Thanks to the encapsulation of the XDP library, many troublesome details such as mmap creation, socket option setting, ring operation, etc., are hidden, providing an easy-to-use interface to the outside world.
Next, let’s look at an example of simultaneous reading and writing.
Example of broadcasting
The following example receives all packets and changes the destination Mac address to a broadcast address before sending them out.
As the comments in the code show, the
- Fill first
- call Poll to wait for incoming data
- call Receive to read the received data
- modify the mac address in the data
- send it out again
If you also test this program, you’d better create a test network, otherwise your network will hang.