Trait use and principle analysis

Among Rust’s design goals, zero-cost abstraction is an important one, giving Rust the power of high-level language expression without the performance penalty. The cornerstones of zero-cost abstraction are generics and traits, which compile high-level syntax into efficient underlying code at compile time, enabling efficient runtime. This article introduces trait, including how it is used and an analysis of three common problems, to illustrate the principles of its implementation in the process of problem solving.

Usage

Basic Usage

The main purpose of a trait is to abstract behavior, similar to “interfaces” in other programming languages. Here is an example to illustrate the basic use of trait.

trait Greeting {
    fn greeting(&self) -> &str;
}

struct Cat;
impl Greeting for Cat {
    fn greeting(&self) -> &str {
        "Meow!"
    }
}

struct Dog;
impl Greeting for Dog {
    fn greeting(&self) -> &str {
        "Woof!"
    }
}

In the above code, a trait Greeting is defined and two structs implement it. Depending on how the function is called, there are two main ways to use it.

Static dispatch based on generic
Dynamic dispatching based on trait object

The concept of generics is relatively common, so here we focus on trait object.

A trait object is an opaque value of another type that implements a set of traits. The set of traits is made up of an object safe base trait plus any number of auto traits.

The more important point is that the trait object belongs to Dynamically Sized Types (DST), which cannot be sized at compile time. It can only be accessed indirectly through a pointer, commonly in the form of Box<dyn trait>, &dyn trait, etc.

fn print_greeting_static<G: Greeting>(g: G) {
    println!("{}", g.greeting());
}
fn print_greeting_dynamic(g: Box<dyn Greeting>) {
    println!("{}", g.greeting());
}

print_greeting_static(Cat);
print_greeting_static(Dog);

print_greeting_dynamic(Box::new(Cat));
print_greeting_dynamic(Box::new(Dog));

Static Derivation

In Rust, the implementation of generics uses monomorphization, which generates different versions of functions at compile time for different types of callers, so generics are also known as type parameters. The advantage is that there is no overhead of virtual function calls, and the disadvantage is that the final binary is bloated. In the above example, print_greeting_static would compile to both of the following versions.

1
2

print_greeting_static_cat(Cat);
print_greeting_static_dog(Dog);

Dynamic dispatching

Not all function calls can determine the caller type at compile time, a common scenario is the callback of event response in GUI programming, in general an event may correspond to more than one callback function, and these callback functions are not determined at compile time, so the generic type is not applicable here, and dynamic dispatching is needed.

trait ClickCallback {
    fn on_click(&self, x: i64, y: i64);
}

struct Button {
    listeners: Vec<Box<dyn ClickCallback>>,
}

impl trait

In Rust version 1.26, a new use of trait was introduced, namely impl trait, which can be used in two places: function arguments and return values. This approach is mainly to simplify the use of complex traits, and is considered a special case version of generics, because where impl trait is used, it is also statically derived, and when used as a function return value, there can only be one data type, which should be paid special attention to!

fn print_greeting_impl(g: impl Greeting) {
    println!("{}", g.greeting());
}
print_greeting_impl(Cat);
print_greeting_impl(Dog);

// 下面代码会编译报错
fn return_greeting_impl(i: i32) -> impl Greeting {
    if i > 10 {
        return Cat;
    }
    Dog
}

// | fn return_greeting_impl(i: i32) -> impl Greeting {
// |                                    ------------- expected because this return type...
// |     if i > 10 {
// |         return Cat;
// |                --- ...is found to be `Cat` here
// |     }
// |     Dog
// |     ^^^ expected struct `Cat`, found struct `Dog`

Higher-order usage

Associated Types

In the basic usage described above, the types of the arguments or return values of the methods in a trait are determined. Rust provides a mechanism for “inert binding” of types, called associated type, so that the types can be determined when the trait is implemented, a common example being the standard library’s Iterator, where the return value of next is Self::Item.

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

/// 一个只输出偶数的示例
struct EvenNumbers {
    count: usize,
    limit: usize,
}
impl Iterator for EvenNumbers {
    type Item = usize;

    fn next(&mut self) -> Option<Self::Item> {
        if self.count > self.limit {
            return None;
        }
        let ret = self.count * 2;
        self.count += 1;
        Some(ret)
    }
}
fn main() {
    let nums = EvenNumbers { count: 1, limit: 5 };
    for n in nums {
        println!("{}", n);
    }
}
// 依次输出  2 4 6 8 10

Similar to the use of association types and generics, Iterators can also be defined using generics.

1
2
3

pub trait Iterator<T> {
    fn next(&mut self) -> Option<T>;
}

They differ mainly in that

A specific type (like Cat above) can implement a generic trait multiple times. e.g. for From<T>, there can be impl From<&str> for Cat and impl From<String> for Cat.
But for traits of associated types, you can only implement them once. For example, for FromStr, you can only have impl FromStr for Cat, and similar traits are Iterator, Deref

Derive

In Rust, you can use the derive property to implement some common traits, such as: Debug/Clone, etc. For user-defined traits you can also implement procedure macros to support derive, for details see. How to write a custom derive macro?, which is not repeated here.

Frequently Asked Questions

Upcast

For trait SubTrait: Base, it is not possible to convert &dyn SubTrait to &dyn Base in the current version of Rust. This limitation is related to the memory structure of the trait object.

In Exploring Rust fat pointers, the author transmutes the reference to the trait object into two usize and verifies that they are pointers to data and function virtual tables.

use std::mem::transmute;
use std::fmt::Debug;

fn main() {
    let v = vec![1, 2, 3, 4];
    let a: &Vec<u64> = &v;
    // 转为 trait object
    let b: &dyn Debug = &v;
    println!("a: {}", a as *const _ as usize);
    println!("b: {:?}", unsafe { transmute::<_, (usize, usize)>(b) });
}

// a: 140735227204568
// b: (140735227204568, 94484672107880)

As you can see here, Rust uses fat pointer to represent references to the trait object, pointing to data and vtable, much like interface in Go.

trait object reference

pub struct TraitObjectReference {
    pub data: *mut (),
    pub vtable: *mut (),
}

struct Vtable {
    destructor: fn(*mut ()),
    size: usize,
    align: usize,
    method: fn(*const ()) -> String,
}

Although fat pointer results in a larger pointer size (not being able to use instructions like Atomic), the benefits are more obvious: 1.

traits can be implemented for existing types (e.g. blanket implementations)
when calling a function in a virtual table, it needs to be referenced only once, whereas in C++, the vtable exists inside the object, resulting in two references for each function call, as shown in the following figure.

How does a vtable store the methods of different traits if the traits have inheritance relationships? In the current implementation, they are stored sequentially in a vtable, as follows.

Multi-trait vtable schematic

As you can see, all the trait methods are put together in order, and there is no distinction between which trait the method belongs to, which also leads to the inability to upcast, there is RFC 2765 in the community to track this problem, interested readers can refer to, here we will not discuss the solution, introduce a more general solution, by introducing an AsBase trait to solve.

trait Base {
    fn base(&self) {
        println!("base...");
    }
}

trait AsBase {
    fn as_base(&self) -> &dyn Base;
}

// blanket implementation
impl<T: Base> AsBase for T {
    fn as_base(&self) -> &dyn Base {
        self
    }
}

trait Foo: AsBase {
    fn foo(&self) {
        println!("foo..");
    }
}

#[derive(Debug)]
struct MyStruct;

impl Foo for MyStruct {}
impl Base for MyStruct {}

fn main() {
    let s = MyStruct;
    let foo: &dyn Foo = &s;
    foo.foo();
    let base: &dyn Base = foo.as_base();
    base.base();
}

Downcast

A downcast is a trait object that is then converted to a previous concrete type, and Rust provides the Any trait to do this.

1
2
3

pub trait Any: 'static {
    fn type_id(&self) -> TypeId;
}

Most types implement Any, only those that contain non-static references do not. The type can be determined at runtime by using type_id, as shown in the following example.

use std::any::Any;
trait Greeting {
    fn greeting(&self) -> &str;
    fn as_any(&self) -> &dyn Any;
}

struct Cat;
impl Greeting for Cat {
    fn greeting(&self) -> &str {
        "Meow!"
    }
    fn as_any(&self) -> &dyn Any {
        self
    }
}

fn main() {
    let cat = Cat;
    let g: &dyn Greeting = &cat;
    println!("greeting {}", g.greeting());

    // &Cat 类型
    let downcast_cat = g.as_any().downcast_ref::<Cat>().unwrap();
    println!("greeting {}", downcast_cat.greeting());
}

The above code focuses on downcast_ref, which is implemented as

pub fn downcast_ref<T: Any>(&self) -> Option<&T> {
    if self.is::<T>() {
        unsafe { Some(&*(self as *const dyn Any as *const T)) }
    } else {
        None
    }
}

You can see that the first pointer to the trait object reference (i.e., the data pointer) is converted to a reference to a specific type by unsafe code when the type is consistent.

Object safety

In Rust, not all traits can be used as trait objects; they need to satisfy certain conditions, called object safety attribute. The main points are as follows.

the return type of the function cannot be Self (i.e., the current type). This is mainly because after converting an object to a trait object, the original type information is lost, so the Self here is not determined.

functions are not allowed to have generic parameters. The main reason is that monomorphism generates a large number of functions, which can easily lead to method expansion within the trait. For example

trait Trait {
fn foo<T>(&self, on: T);
// more methods
}

// 10 implementations
fn call_foo(thing: Box<Trait>) {
thing.foo(true); // this could be any one of the 10 types above
thing.foo(1);
thing.foo("hello");
}

// 总共会有 10 * 3 = 30 个实现

Traits cannot inherit from Sized, because Rust implements the trait for the trait object by default, generating code like the following.

trait Foo {
fn method1(&self);
fn method2(&mut self, x: i32, y: String) -> usize;
}

// autogenerated impl
impl Foo for TraitObject {
fn method1(&self) {
    // `self` is an `&Foo` trait object.

    // load the right function pointer and call it with the opaque data pointer
    (self.vtable.method1)(self.data)
}
fn method2(&mut self, x: i32, y: String) -> usize {
    // `self` is an `&mut Foo` trait object

    // as above, passing along the other arguments
    (self.vtable.method2)(self.data, x, y)
}
}

If Foo inherits Sized, then it requires that the trait object is also Sized, and the trait object is of type DST, which is ?Sized, so the trait cannot inherit Sized.

For traits that are not safe, it is best to modify them to be safe, but if not, you can try a generic approach.

Summary

This article began with the introduction of trait as the basis for zero-cost abstraction, the ability to add new methods to existing types through trait, which actually solves the expression problem, allows operator overloading, allows interface-oriented programming, etc. . We hope that the analysis in this article will allow readers to better navigate the use of trait and to be comfortable in the face of compiler errors.

Table of Contents