Optimization of Rust Enum Layout

I learned a little bit about Rust Enum today, and before you start reading, you can guess what the output of the following Rust code is on a common 64 bit machine.

struct A (i64, i8);
struct B (i64, i8, bool);
fn main() {
    dbg!(std::mem::size_of::<A>());
    dbg!(std::mem::size_of::<Option<A>>());
    dbg!(std::mem::size_of::<B>());
    dbg!(std::mem::size_of::<Option<B>>());
}

In this Rust Playground you can see the result.

Rust enum is essentially a tagged union, which corresponds to the sum type of the algebraic data type, and is not expanded upon here. In Rust enum implementations, a byte is usually used to store the type tag, which means that ideally the following two structures are equivalent.

enum Attr {
    Color(u8, u8, u8),
    Shape(u16, u16),
}

struct Color { uint8_t r, g, b };
struct Shape { uint16_t w, h };
struct Attr {
    uint8_t tag;
    union {
        Color color;
        Shape shape;
    }
}

Under this implementation, enum is not zero overhead in many scenarios. Fortunately, Rust has never defined an ABI, and enum with data cannot even be represented by repr(C), which gives Rust plenty of room for fine-grained optimizations of enum memory layout. This article will cover some of the relevant optimizations under rustc 1.60.0-nightly.

Before we start exploring the details, we need to prepare a helper function that will help us look at the memory structure of our variables.

fn print_memory<T>(v: &T) {
    println!("{:?}", unsafe {
        core::slice::from_raw_parts(v as *const _ as *const u8, std::mem::size_of_val(v))
    })
}

Take the above Attr as an example.

print_memory(&Attr::Color(1, 2, 3));
// [0, 1, 2, 3, 0, 0]
//  ^  ^  ^  ^  ^^^^
//  |  |  |  |    |
//  |  |  |  |    --- padding
//  |  |  |  -------- b
//  |  |  |---------- g
//  |  |------------- r
//  ----------------- tag

print_memory(&Attr::Shape(257, 258));
// [1, 0, 1, 1, 2, 1]
//  ^  ^  ^^^^  ^^^^
//  |  |    |     |
//  |  |    |     --- h, 256 + 2
//  |  |    --------- w, 256 + 1
//  |  |------------- padding
//  ----------------- tag

`Option<P<T>>`

P is a common smart pointer type, including & / &mut / Box. This is probably the best known example of enum layout optimization. rust recommends using Option<P<T>> for nullable pointers, which implements null safety.

Option<T> is represented in rust as an enum.

pub enum Option<T> {
    None,
    Some(T),
}

Without any optimization, it is clear that there is an unnecessary overhead, and that a null pointer can fully represent the semantics of None. Since this is too common, rustc not only makes targeted optimizations, but also standardizes them.

If T is an FFI-safe non-nullable pointer type, Option is guaranteed to have the same layout and ABI as T and is therefore also FFI-safe. As of this writing, this covers &, &mut, and function pointers, all of which can never be null.

let p = Box::new(0u64);
print_memory(&p);
// [208, 185, 38, 40, 162, 85, 0, 0]
print_memory(&Some(p));
// [208, 185, 38, 40, 162, 85, 0, 0]

One tip is that this hack is not specific to Option, but to pointer types. Any custom enum that satisfies the condition can achieve the same effect.

enum MyOption<T> {
    MySome(T),
    MyNone,
}

let p = Box::new(0u64);
print_memory(&MyOption::MySome(p));
// [208, 185, 38, 40, 162, 85, 0, 0]
// The address of `p`.
print_memory(&MyOption::<Box<u64>>::MyNone);
// [0, 0, 0, 0, 0, 0, 0, 0]
// Use nullptr to represent `MyNone`.

The fundamental reason why Option<P<T>> can be optimized is that there is an always illegal value under the in-memory representation of P, and the corresponding enum only needs to represent an extra value to express the redundant type. Exceeding this constraint causes this optimization to fail.

enum MyOption2<T> {
    MySome(T),
    MyNone,
    MyNone2,
}

let p = Box::new(0u64);
print_memory(&MyOption2::MySome(p));
// [0, 0, 0, 0, 0, 0, 0, 0, 208, 185, 38, 40, 162, 85, 0, 0]
print_memory(&MyOption2::<Box<u64>>::MyNone2);
// [2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

`bool` , `Ordering`

bool in rust occupies one byte and has only two legal values, True and False, which correspond to the memory representations 1u08 and 0u08. We can interpret this to mean that bool has 254 illegal values available for the type tag.

print_memory(&Some(false));
// [0]
print_memory(&Some(true));
// [1]
print_memory(&(None as Option<bool>));
// [2]

To take it a step further, we can be a little more forceful.

print_memory(&Some(Some(false)));
// [0]
print_memory(&Some(Some(true)));
// [1]
print_memory(&(Some(None) as Option<Option<bool>>));
// [2]
print_memory(&(None as Option<Option<bool>>));
// [3]

Correspondingly, Ordering has three legal values that also apply to this optimization.

print_memory(&Some(std::cmp::Ordering::Less));
// [255]
print_memory(&Some(std::cmp::Ordering::Greater));
// [1]
print_memory(&Some(std::cmp::Ordering::Equal));
// [0]
print_memory(&(None as Option<std::cmp::Ordering>));
// [2]

Enum

In fact, the compiler does not make special judgments for bool, Ordering, and any enum of less than 256 kinds itself satisfies the condition to be optimized, i.e., there will be (256 - kinds) empty bits in the type tag.

enum ShapeKind { Square, Circle }
print_memory(&Some(ShapeKind::Square));
// [0]

print_memory(&MyOption::MySome(MyOption::MySome(1u8)));
// [0, 1]
print_memory(&(MyOption::MyNone as MyOption<MyOption<u8>>));
// [2, 0]
// 尽管 u8 本身没有空位，但是 MyOption 的 type tag 有 254 个空位，因此外层的 MyOption 的 typetag 被优化掉了。

Struct, Tuple

Struct, Tuple, etc. are all Product types, and are often implemented by storing all the fields in order and making extra padding, so it is a logical optimization that if one of the fields of the struct has an empty space, then the enum tag can be stuffed into it.

struct A1 (i8, bool);
print_memory(&Some(A1(1, false)));
// [1, 0]
print_memory(&Some(A1(1, true)));
// [1, 1]
print_memory(&(None as Option<A1>));
// [0, 2]

Let’s go back to the example at the beginning of the article.

struct A (i64, i8);
struct B (i64, i8, bool);

dbg!(std::mem::size_of::<A>());
// 16
dbg!(std::mem::size_of::<Option<A>>());
// 24
dbg!(std::mem::size_of::<B>());
// 16
dbg!(std::mem::size_of::<Option<B>>());
// 16

print_memory(&Some(A(1, 1)));
// [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
//  ^  ^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^ ^  ^^^^^^^^^^^^^^^^^^^
//  |           |                      |            |           |
//  |           |                      |            |           ------------ padding
//  |           |                      |            ------------------------ .1
//  |           |                      ------------------------------------- .0
//  |           ------------------------------------------------------------ padding
//  ------------------------------------------------------------------------ tag

A and B both take 16 bytes due to padding, while Option<B> only takes 16 bytes due to the presence of a bool field .2, which is optimized into the bool, and Option<A> uses 24 bytes.

An easy optimization to think of is to use undefined memory in the padding to store the type tag. unfortunately, the layout of A is determined when compiling A itself, and the data stored in Option<A> in the padding of A is undefined behavior. This also leads to the funny result that by storing an extra field, Option takes up less space.

Of course, it is perfectly possible to use padding to store data, but only if it does not affect the memory layout of the child data structures. if we put A in Option manually expand.

use std::mem::size_of;
struct A(i64, i8);
enum OptionA { Some(i64, i8), None }
dbg!(size_of::<A>()); // 16
dbg!(size_of::<Option<A>>()); // 24
dbg!(size_of::<OptionA>()); // 16

In this case, since the data of OptionA is stored directly inside Some, it is perfectly possible to use padding to store the type tag when instantiating without causing potentially undefined behavior.

Optimization possibilities for custom structs

Currently, all enum layout-related optimizations are suitable for type-specific hacking by the compiler, and we have no control over the layout of our custom structs in the enum. To optimize some common scenarios, rust also provides NonZero* to represent non-zero integers, while the compiler makes size_of::<Option<NonZeroU8>> == size_of::<u8>. But this can only be handled by the standard library case by case, and real requirements are very complex, e.g. sometimes we may need NonMaxU64, or for example use a third-party library Option<ordered_float::NotNan<f64>> will not be optimized.

Optimizing for custom interfaces requires introducing very complex mechanisms to tell the compiler at compile time what memory structures are illegal for a type. I currently feel that a possible implementation would be const trait + const iterator, giving the compiler an iterator over potentially illegal values. However, I don’t see an RFC for this at the moment.

All the examples in this article can be found at Rust Playground.

Table of Contents

Option<P<T>>

bool , Ordering

Enum

Struct, Tuple

Optimization possibilities for custom structs

`Option<P<T>>`

`bool` , `Ordering`