Manually implement zero-overhead async trait in Rust with GAT

In this article, we will explain how to implement a zero-overhead async trait in Rust using GAT, using a series of RocksDB-like iterators as an example. The code in this article requires nightly Rust to compile.

We will implement two iterators.

TestIterator: Produces an ordered sequence of KVs.
ConcatIterator: stitch together sequences of multiple iterators.

Example: TestIterator::new(0, 5) will produce the following sequence.

key_00000 -> value_00000
key_00001 -> value_00001
key_00002 -> value_00002
key_00003 -> value_00003
key_00004 -> value_00004

ConcatIterator::new(vec![TestIterator::new(0, 5), TestIterator::new(5, 7)]) will generate:

key_00000 -> value_00000
key_00001 -> value_00001
key_00002 -> value_00002
key_00003 -> value_00003
key_00004 -> value_00004
key_00005 -> value_00005
key_00006 -> value_00006

Define async trait

KvIterator is a trait that will be given to all iterator implementations. the user can call .next() to move the iterator to the next position.

pub trait KvIterator {
    /// Get the next item from the iterator.
    async fn next(&mut self) -> Option<(&[u8], &[u8])>;
}

Executing the compilation will give an error.

error[E0706]: functions in traits cannot be declared `async`
 --> src/kv_iterator.rs:5:5
  |
5 |     async fn next(&mut self) -> Option<(&[u8], &[u8])>;
  |     -----^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |     |
  |     `async` because of this
  |
  = note: `async` trait functions are not currently supported
  = note: consider using the `async-trait` crate: https://crates.io/crates/async-trait

The Rust compiler does not support the async trait function by default, and the compiler prompts for the async-trait crate, which unfortunately is not zero overhead. After using async-trait, the trait will be rewritten in the following form.

#[async_trait]
pub trait KvIterator {
    /// Get the next item from the iterator.
    async fn next(&mut self) -> Option<(&[u8], &[u8])>;
}

/// ... will be rewritten to

pub trait KvIterator {
    /// Get the next item from the iterator.
    fn next(&mut self) -> Box<dyn Future<Output = Option<(&[u8], &[u8])>>>;
}

Here there are two layers of overhead.

The overhead of dynamic scheduling dyn Future. This means that the next function of all iterators is more difficult to do some compiler optimization.
Memory allocation overhead Box. In KV storage, next should be a function that is called often. trait has been rewritten in a new form by async-trait, and each call to .next requires a new object on the heap. This can have a significant impact on the performance of the program.

How can we implement async trait with zero overhead? That’s where GAT comes in.

Writing async trait with GAT

The compiler does not support async traits, essentially because the .next function of the impl KvIterator returns a different type of Future. This problem can be solved simply by using associated type.

pub trait KvIterator {
    type NextFuture: Future<Output = Option<(&[u8], &[u8])>>;

    /// Get the next item from the iterator.
    fn next(&mut self) -> Self::NextFuture;
}

This introduces a problem: &'lifetime [u8] needs to have a lifecycle, how should this lifecycle be written? Rationally, &'lifetime is the same as &mut self lifecycle of next, so it should be a generic of NextFuture itself. How do you express this in Rust? Obviously this requires generic associated type. With the GAT compile option turned on.

pub trait KvIterator {
    type NextFuture<'a>: Future<Output = Option<(&'a [u8], &'a [u8])>>;

    /// Get the next item from the iterator.
    fn next(&mut self) -> Self::NextFuture<'_>;
}

The compiler reported another error.

error: missing required bound on `NextFuture`
 --> src/kv_iterator.rs:4:5
  |
4 |     type NextFuture<'a>: Future<Output = Option<(&'a [u8], &'a [u8])>>;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-
  |                                                                       |
  |                                                                       help: add the required where clause: `where Self: 'a`
  |
  = note: this bound is currently required to ensure that impls have maximum flexibility
  = note: we are soliciting feedback, see issue #87479 <https://github.com/rust-lang/rust/issues/87479> for more information

Why?

NextFuture is returned by the next function, whereas a normal implementation of the next function can only return a reference with the same life cycle as &mut self. Rust’s trait can be implemented on a reference (e.g., impl <'a> Iterator for &'a mut SomeIterator ). If Self (in the above example, &'a mut SomeIterator ) itself has a shorter lifetime than this reference, it would not be possible to return such a NextFuture.

So here, we need to add a where Self: 'a to make sure that Self has a lifespan at least as long as 'a of NextFuture.

In older versions of the Rust compiler, this is not reported as an error without Self: 'a, but in some odd places. It’s a good thing that the compiler pointed this out directly here.

pub trait KvIterator {
    type NextFuture<'a>: Future<Output = Option<(&'a [u8], &'a [u8])>>
    where
        Self: 'a;

    /// Get the next item from the iterator.
    fn next(&mut self) -> Self::NextFuture<'_>;
}

That’ll get it to compile!

Implementing TestIterator

First, quickly write the framework for TestIterator.

pub struct TestIterator {
    idx: usize
}

impl KvIterator for TestIterator {
    type NextFuture = /* */;
    fn next(&mut self) -> Self::NextFuture<'_> {
        
    }
}

Two problems have been encountered here.

How should I write the logic inside next? next returns a Future, not the common async fn.
- The answer is simple: use async move to return a closure.
Since next returns a function, how do you write the type of NextFuture? As we all know, Rust functions can’t be written with types.
- Here we have to turn on a feature that lets the compiler automatically derive the specific type of Future.

#![feature(generic_associated_types)]
#![feature(type_alias_impl_trait)]

pub struct TestIterator {
    idx: usize,
}

impl KvIterator for TestIterator {
    type NextFuture<'a> = impl Future<Output = Option<(&'a [u8], &'a [u8])>>;
    fn next(&mut self) -> Self::NextFuture<'_> {
        async move { Some((b"key".as_slice(), b"value".as_slice())) }
    }
}

This way, it will pass compilation. Implement the logic inside TestIterator.

pub struct TestIterator {
    idx: usize,
    to_idx: usize,
    key: Vec<u8>,
    value: Vec<u8>,
}

impl TestIterator {
    pub fn new(from_idx: usize, to_idx: usize) -> Self {
        Self {
            idx: from_idx,
            to_idx,
            key: Vec::new(),
            value: Vec::new(),
        }
    }
}
impl KvIterator for TestIterator {
    type NextFuture<'a>
    where
        Self: 'a,
    = impl Future<Output = Option<(&'a [u8], &'a [u8])>>;

    fn next(&mut self) -> Self::NextFuture<'_> {
        async move {
            if self.idx >= self.to_idx {
                return None;
            }

            // Zero-allocation key value manipulation

            self.key.clear();
            write!(&mut self.key, "key_{:05}", self.idx).unwrap();

            self.value.clear();
            write!(&mut self.value, "value_{:05}", self.idx).unwrap();

            self.idx += 1;
            Some((&self.key[..], &self.value[..]))
        }
    }
}

To test that the TestIterator is working properly.

#[tokio::test]
async fn test_iterator() {
    let mut iter = TestIterator::new(0, 10);
    while let Some((key, value)) = iter.next().await {
        println!(
            "{:?} {:?}",
            Bytes::copy_from_slice(key),
            Bytes::copy_from_slice(value)
        );
    }
}

Run a test, as expected, success!

running 1 test
b"key_00000" b"value_00000"
b"key_00001" b"value_00001"
b"key_00002" b"value_00002"
b"key_00003" b"value_00003"
b"key_00004" b"value_00004"
b"key_00005" b"value_00005"
b"key_00006" b"value_00006"
b"key_00007" b"value_00007"
b"key_00008" b"value_00008"
b"key_00009" b"value_00009"
test test_iterator::tests::test_iterator ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Implementing ConcatIterator

The logic of ConcatIterator is also very simple: record which iterator is being accessed now, and just return the contents of the child iterator directly.

pub struct ConcatIterator<Iter: KvIterator> {
    iters: Vec<Iter>,
    current_idx: usize,
}

impl<Iter: KvIterator> ConcatIterator<Iter> {
    pub fn new(iters: Vec<Iter>) -> Self {
        Self {
            iters,
            current_idx: 0,
        }
    }
}

impl<Iter: KvIterator> KvIterator for ConcatIterator<Iter> {
    type NextFuture<'a>
    where
        Self: 'a,
    = impl Future<Output = Option<(&'a [u8], &'a [u8])>>;

    fn next(&mut self) -> Self::NextFuture<'_> {
        async move {
            loop {
                if self.current_idx >= self.iters.len() {
                    return None;
                }
                let iter = &mut self.iters[self.current_idx];
                match iter.next().await {
                    Some(item) => {
                        return Some(item);
                    }
                    None => {
                        self.current_idx += 1;
                    }
                }
            }
        }
    }
}

However, it’s not that simple. The compiler reported an error.

error[E0502]: cannot borrow `self.iters` as immutable because it is also borrowed as mutable
  --> src/concat_iterator.rs:28:40
   |
28 |                 if self.current_idx >= self.iters.len() {
   |                                        ^^^^^^^^^^^^^^^^ immutable borrow occurs here
...
31 |                 let iter = &mut self.iters[self.current_idx];
   |                                 ---------- mutable borrow occurs here
32 |                 match iter.next().await {
33 |                     Some(item) => return Some(item),
   |                                          ---------- returning this value requires that `self.iters` is borrowed for `'1`
...
37 |         }
   |         - return type of async block is Option<(&'1 [u8], &[u8])>

error[E0499]: cannot borrow `self.iters` as mutable more than once at a time
  --> src/concat_iterator.rs:31:33
   |
31 |                 let iter = &mut self.iters[self.current_idx];
   |                                 ^^^^^^^^^^ `self.iters` was mutably borrowed here in the previous iteration of the loop
32 |                 match iter.next().await {
33 |                     Some(item) => return Some(item),
   |                                          ---------- returning this value requires that `self.iters` is borrowed for `'1`
...
37 |         }
   |         - return type of async block is Option<(&'1 [u8], &[u8])>

What’s going on here? Unfortunately, this is a flaw in Rust’s current borrow checker NLL. Even if the code makes logical sense, the borrow checker doesn’t think it does.

What can we do about this? Let’s try to avoid it in two ways.

Option 1: Change the Borrow Checker

Polonius is a brand new borrow checker. Enable it directly with the flag.

`1`	`RUSTFLAGS="-Z polonius" cargo build`

The compilation passes.

Polonius uses a different algorithm than the current Rust borrow checker NLL, and can handle more cases where a race condition does not actually occur, but cannot currently be compiled. The Rust program that Polonius can compile is a superset of NLL.

Option 2: Storing the key value inside the structure

We cache the kv pair returned by the lower level iterator inside ConcatIterator, which also passes compilation. Unfortunately, this gives us an extra copy of .next(), which is a bit less “zero overhead”.

pub struct ConcatIterator<Iter: KvIterator> {
    iters: Vec<Iter>,
    key: Vec<u8>,
    value: Vec<u8>,
    current_idx: usize,
}

impl<Iter: KvIterator> ConcatIterator<Iter> {
    pub fn new(iters: Vec<Iter>) -> Self {
        Self {
            iters,
            current_idx: 0,
            key: Vec::new(),
            value: Vec::new(),
        }
    }
}

impl<Iter: KvIterator> KvIterator for ConcatIterator<Iter> {
    type NextFuture<'a>
    where
        Self: 'a,
    = impl Future<Output = Option<(&'a [u8], &'a [u8])>>;

    fn next(&mut self) -> Self::NextFuture<'_> {
        async move {
            loop {
                if self.current_idx >= self.iters.len() {
                    return None;
                }
                let iter = &mut self.iters[self.current_idx];
                match iter.next().await {
                    Some((key, value)) => {
                        self.key.clear();
                        self.key.extend_from_slice(key);
                        self.value.clear();
                        self.value.extend_from_slice(value);

                        break Some((self.key.as_slice(), self.value.as_slice()));
                    }
                    None => {
                        self.current_idx += 1;
                    }
                }
            }
        }
    }
}

Option 3: Refactor `KvIterator` trait

pub trait KvIterator {
    /// The return type of `next`.
    type KvIteratorNextFuture<'a>: Future<Output = ()>
    where
        Self: 'a;

    /// Move the iterator to the position of the next key.
    fn next(&mut self) -> Self::KvIteratorNextFuture<'_>;

    /// Get the current key.
    fn key(&self) -> &[u8];

    /// Get the current value.
    fn value(&self) -> &[u8];

    /// Check if the iterator is exhausted.
    fn is_valid(&self) -> bool;
}

We don’t use the Rust-style iterator implementation: .next moves only the iterator position; key , value returns the content; is_valid checks if we’re at the end. This also bypasses the lifecycle issue.

For simplicity of implementation, let’s run a unit test using option 2.

#[tokio::test]
async fn test_iterator() {
    let iter1 = TestIterator::new(0, 5);
    let iter2 = TestIterator::new(5, 10);
    let mut concat_iter = ConcatIterator::new(vec![iter1, iter2]);

    while let Some((key, value)) = concat_iter.next().await {
        println!(
            "{:?} {:?}",
            Bytes::copy_from_slice(key),
            Bytes::copy_from_slice(value)
        );
    }
}

The result is correct!

running 1 test
b"key_00000" b"value_00000"
b"key_00001" b"value_00001"
b"key_00002" b"value_00002"
b"key_00003" b"value_00003"
b"key_00004" b"value_00004"
b"key_00005" b"value_00005"
b"key_00006" b"value_00006"
b"key_00007" b"value_00007"
b"key_00008" b"value_00008"
b"key_00009" b"value_00009"
test concat_iterator::tests::test_iterator ... ok

Summary

Using the generic_associated_types and type_alias_impl_trait traits, we can easily implement the async trait manually and avoid the memory allocation and dynamic scheduling overhead of the async-trait crate. However, there are several problems with this.

requires nightly Rust
Cannot be recursive (consider async-recursion crate)
can’t be made dyn type directly (can be done manually with type gymnastics trick)

Table of Contents