In a nutshell: In skyzh/uring-positioned-io, I wrapped the underlying
io_uring interface provided by Tokio and implemented
io_uring based asynchronous random file reading in Rust. You can use it like this.
This article introduces the basic use of
io_uring, then describes the implementation of an asynchronous read file library I wrote, and finally does a benchmark to compare performance with mmap.
io_uring is an asynchronous I/O interface provided by the Linux kernel. It was introduced in Linux 5.1 in May 2019 and is now used in various projects. For example.
- RocksDB’s MultiRead is currently doing concurrent reads of files via
- Tokio wraps a layer of API for
io_uring. With the release of Tokio 1.0, the developers have indicated that true asynchronous file operations will be provided via io_uring in the future (see Announcing Tokio 1.0). Currently Tokio’s asynchronous file operations are implemented by opening a separate I/O thread to call the synchronous API.
- QEMU 5.0 already uses
Most of the current tests on
io_uring compare the performance of Direct I/O with Linux AIO.
io_uring can typically achieve twice the performance of AIO.
Random file reading scenarios
In database systems, we often need multiple threads to read the contents of a file at any location
(<fid>, <offset>, <size>). The often used
read / write API cannot do this (because it has to seek first and needs an exclusive file handle). The following method allows for random reads of files.
- Map the file directly into memory via
mmap. Reading a file becomes a direct memory read, and can be done concurrently in multiple threads.
countbytes starting from a certain location
offset, also supporting concurrent reads in multiple threads.
However, both of these options block the current thread. For example, if a page fault occurs after
mmap reads a block of memory, the current thread will block;
pread itself is a blocking API. Asynchronous APIs (e.g. Linux AIO /
io_uring) can reduce context switching and thus improve throughput in some scenarios.
Basic usage of io_uring
io_uring related syscall can be found at here. The liburing provides an easier-to-use API. Tokio’s io_uring crate builds on this by providing a
io_uring API for the Rust language. language’s
io_uring API. Here is an example of how to use
io_uring, you need to create a ring, and here we use the
concurrent API provided by
tokio-rs/io-uring, which supports multiple threads using the same ring.
Each ring corresponds to a commit queue and a completion queue, where the queue is set to hold up to 256 elements.
The process of I/O operations via
io_uring is divided into three steps: adding tasks to the commit queue, submitting tasks to the kernel, and retrieving tasks from the completion queue. Here is an example of the process of reading a file.
You can construct a read file task with
opcode::Read, and add the task to the queue with
Once the task has been added, commit it to the kernel.
Final polling for completed tasks.
In this way, we implement random reads of files based on
io_uring currently has three execution modes: default mode, poll mode and kernel poll mode. If you use kernel poll mode, you do not necessarily need to call the function that commits the task.
Implementing an asynchronous read file interface with io_uring
Our goal is to implement an interface like this, wrapping
io_uring and exposing only a simple
read function to the developer.
After referring to tokio-linux-aio for asynchronous wrapping of Linux AIO, I used the following method to implement
io_uring based asynchronous reads.
- The developer needs to create a
- While the
UringContextis created, one (or more)
UringPollFutureis run in the background to submit and poll for completed tasks. (corresponds to the second and third operations of the read file in the previous section).
- A developer can call the interface for reading a file from
ctx, creating a
ctx.read. After calling
UringReadFuturecreates an object
UringTaskthat is fixed in memory, and then puts the read file task into a queue, using the address of
UringTaskas the user data for the read operation. There is a channel inside
UringPollFuturesubmits the task in the background.
UringPollFuturepolls for completed tasks in the background.
UringPollFuturefetches the user data, reduces it to a
UringTaskobject, and notifies
UringReadFuturethrough the channel that the I/O operation has completed.
The whole process is shown in the following figure.
This makes it easy to call
io_uring to read the file asynchronously. This also has a side benefit: task commits can be automatically batching. Normally, an I/O operation would generate a syscall, but since we use a single Future to commit and poll tasks, there may be multiple uncommitted tasks in the queue at commit time, and they can all be committed at once. This reduces the overhead of syscall context cutting (and of course increases latency). From the benchmark results, we can see that each commit can pack about 20 read tasks.
Compare the performance of wrapped
io_uring with that of
mmap. The test load is 128 1G files with random read aligned 4K blocks. My computer has 32G of RAM and a 1T NVMe SSD. 6 cases were tested as follows.
- 8-thread mmap. (mmap_8)
- 32 thread mmap. (mmap_32)
- 512-thread mmap. (mmap_512)
- 8 threads 8 concurrent
- 8 threads 32 concurrent
io_uring. That is, 8 worker threads, 32 future simultaneous reads.(uring_32)
- 8 threads of 512 concurrent
Tested Throughput (op/s) and Latency (ns).
Found that mmap is far superior to
io_uring. Well, it’s true that this wrapper is not very good, but it’s barely usable. Here is a heatmap with one minute latency, each set of data is presented in the order of mmap first and then
mmap_8 / uring_8
mmap_32 / uring_32
mmap_512 / uring_512
Some possible improvements
- It looks like right now
io_uringis not performing very well after my wrapping with Tokio. You can later test the overhead introduced by this wrapping of Tokio by comparing the performance of Rust / C on the
- Test the performance of Direct I/O. Only Buffered I/O has been tested so far.
- Compare to Linux AIO, performance is not worse than Linux AIO.
- Use perf to see where the bottlenecks are now. Currently
cargo flamegraphis hooked up and
io_uringcan’t request memory.
- Currently, the user must ensure that
&mut bufis valid throughout the read cycle. If Future is aborted, there will be a memory leak. A similar problem with futures-rs can be found at https://github.com/rust-lang/futures-rs/issues/1278 . Tokio’s current I/O solves this problem by copying twice (first to the cache, then to the user).
- Maybe wrap write files and other operations along the way as well.