Although the network protocol stack provides a wealth of features that allow us to easily achieve data exchange on the network, but sometimes we are not so satisfied with the performance of the stack, the previous articles I also introduced the way to efficiently process network data through XDP and other technologies, but after all, XDP is not yet so widely used, and it is not yet so simple to use. If we read and write data through the standard library provided by the programming language, how else can we improve performance? Today we will introduce a way to read and write packets in batch.
Intuitively, we can also understand that batch sending and receiving network packets is more efficient than individual sending and receiving network packets, because in the ordinary logic of individual sending and receiving packets, each sending and receiving of a packet requires at least one system call (
RecvFrom), while in the batch approach, one system call can handle multiple network packets, so from the theoretical analysis, the batch method is more efficient. Not only network processing, but also many message queues and data stores will get better performance by batch processing.
In this article, I did not do a test on the performance of batch processing network packets and ordinary processing of a single network packet, someone did a simple test, and did not find the benefits of batch processing, of course, I guess his test may be too single or simple, cloudflare also did a million pps test, batch processing way performance is still very good. I think you in the evaluation of this technology also according to your scenario to do a little performance testing, so as to ensure that the technology is suitable for adoption.
The technique I’m talking about for batch processing of sending and receiving packets is implemented via the system calls
recvmmsg, which are currently only supported on Linux systems. As described in the man manual, they are system calls that send and receive multiple packets on a socket.
- sendmmsg - send multiple messages on a socket:
int sendmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,int flags);
- recvmmsg - receive multiple messages on a socket:
int recvmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,int flags, struct timespec *timeout);
These two system calls were first added to Linux in version 3.0, and glibc since version 2.14. OpenBSD 7.2 also added this system call.
recvmmsg is a
blocking system call that does not return until it has received a vlen number of messages or timed out.
These two system calls are extensions to
recvmsg. If you have studied this before, you probably know that there are multiple system calls
recv for reading and writing network data.
- send: The send function is used to send a packet, either a TCP connection or a UDP datagram. Similar to
write, except that
writedoes not have a flag setting.
- sendto: The sendto function is similar to the send function, but it can specify the recipient’s address when sending a packet. If it is a connection-oriented protocol such as TCP, dest_addr and addrlen can be ignored, otherwise, such as UDP, these two parameters need to be specified.
- sendmsg: The sendmsg function can send data from multiple buffers. Also, it can specify one or more additional data. You need to specify
- sendmmsg: The sendmmsg function can send multiple messages in one call, each message can have one or more buffers. This can reduce the number of system calls and thus increase efficiency.
Similarly, the main system calls for receiving are the following:
- recv: recv is the most basic receive function that receives data from a socket and returns the number of bytes received. It receives data without additional information (such as the destination address, etc.). It is similar to
read, except that
readhas no flag setting.
- recvfrom: recvfrom also receives data from a socket, but it also returns the sender’s address information, suitable for protocols with address information such as UDP.
- recvmsg: recvmsg can receive data along with other related data information (such as whether the received data is truncated, the IP address of the sender, etc.). It supports reception of multiple data buffers, as well as control messages (cmsg).
- recvmmsg: recvmmsg is a multi-message version of recvmsg, which can receive multiple messages at the same time and is suitable for high concurrency and high throughput scenarios.
Corresponding to the methods of the standard library’s conn, take UDPConn as an example:
conn.Write(_): using the
conn.WriteTo(_, _): Use the
conn.WriteToUDP(_, _): Use the
conn.WriteMsgUDP(_, _, _): using the
conn.WriteMsgUDPAddrPort(_, _, _): using the
conn.Read(nil): uses the
conn.ReadFrom(nil): use the
ReadMsgUDP(nil, nil): use the
conn.ReadMsgUDPAddrPort(nil, nil): use the
conn.ReadFromUDP(nil): use the
Unfortunately, the Go standard library does not provide a wrapper for the system calls
recvmmsg, which are not even defined in
syscall, so there is no corresponding batch processing method for things like
net.UDPConn. 2021, in go issue #45886, bradfitz proposed adding a batch read/write message to
*UDPConn. This proposal was approved for acceptance by Russ Cox on November 10, 2022, but no one has yet picked up the proposal to implement it.
However, the go extension library golang.org/x/net provides
WriteBatch methods that provide methods for reading and writing messages in batches, which actually encapsulate the system calls readmmsg and sendmmsg.
Of course you can also wrap the system call with a similar implementation.
But currently they all have a problem, like
ReadBatch is blocking, if not enough messages are received, the current thread will be blocked; no problem if the number of threads is small, but if the number of threads is large, there will be a lack of resources and performance problems. bradfitz in the proposal expects to be able to integrate with the standard library net poller This will prevent threads from being blocked, so you can expect this feature to be implemented soon.
So, let’s take a look at the best way to implement batch read and write UDP messages at the moment, which is by using ipv4 packets.
We can make use of
ipv4.PacketConn which provides the
- func (c *PacketConn) ReadBatch(ms Message, flags int) (int, error): reads a batch of messages, it returns the number of messages read, up to len(ms)
- func (c *PacketConn) WriteBatch(ms Message, flags int) (int, error): write a batch of messages, it returns the number of messages written successfully
Next, we demonstrate the ability to read and write in batches with an example of a UDP client and server.
Here is the client code. It first creates an instance of
*net.UDPConn using the standard library and converts it to
The next 10 messages are prepared and need to be prepared as
ipv4.Message type. If you use it in a product, it is better to pool these objects using Pool. This example is relatively simple, so there is no consideration of performance, and the value demonstrates the function of batch reading and writing.
After preparing the data, call
WriteBatch batch write, here also did not consider how to handle if some did not write successfully, assuming that all write successfully.
Next, read the return package, assuming that the server is to return each message, so here if the batch read is not read out, will continue to read until all the messages are read back.
The server-side code is similar, receiving the messages in batches and then writing them back in batches as they are.
Run server and client, you can see in the client that all 10 messages are received.
You can actually use
ipv4.Conn to read and write batches, the underlying layer is the same. The underlying type is
golang.org/x/net/internal/socket.Conn, which contains the following methods for reading and writing messages in batches:
- SendMsgs(ms Message, flags int) (int, error): batch write message, wrapping system call
sendmmsg, returns the number of messages sent successfully
- RecvMsgs(ms Message, flags int) (int, error): batch read message, wrapping system call
recvmmsg, returns the number of messages received successfully
The usage is almost the same as above, but with different method names. Here is the client-side code.
Here is the server-side code.