rust

Those familiar with c++ definitely know shared_ptr , unique_ptr , and Rust also has smart pointers Box , Rc , Arc , RefCell , etc. This article shares the underlying implementation of Box .

Box<T> allocates space on the heap, stores the T value, and returns the corresponding pointer. Also Box implements trait Deref dereference and Drop destructor to automatically free space when Box leaves the scope.

Getting Started Example

Example from the rust book, without the print statement for demonstration purposes.

1
2
3
fn main() {
    let _ = Box::new(0x11223344);
}

Assign the variable 0x11223344 to the heap, the so-called boxing, which java students are surely familiar with. Let’s mount docker, and use rust-gdb to see the assembly implementation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Dump of assembler code for function hello_cargo::main:
   0x000055555555bdb0 <+0>:	sub    $0x18,%rsp
   0x000055555555bdb4 <+4>:	movl   $0x11223344,0x14(%rsp)
=> 0x000055555555bdbc <+12>:	mov    $0x4,%esi
   0x000055555555bdc1 <+17>:	mov    %rsi,%rdi
   0x000055555555bdc4 <+20>:	callq  0x55555555b5b0 <alloc::alloc::exchange_malloc>
   0x000055555555bdc9 <+25>:	mov    %rax,%rcx
   0x000055555555bdcc <+28>:	mov    %rcx,%rax
   0x000055555555bdcf <+31>:	movl   $0x11223344,(%rcx)
   0x000055555555bdd5 <+37>:	mov    %rax,0x8(%rsp)
   0x000055555555bdda <+42>:	lea    0x8(%rsp),%rdi
   0x000055555555bddf <+47>:	callq  0x55555555bd20 <core::ptr::drop_in_place<alloc::boxed::Box<i32>>>
   0x000055555555bde4 <+52>:	add    $0x18,%rsp
   0x000055555555bde8 <+56>:	retq
End of assembler dump.

The key point is two, alloc::alloc::exchange_malloc allocates memory space on the heap and then stores 0x11223344 to the address of this malloc

At the end of the function, the address is passed to core::ptr::drop_in_place to be freed, because the compiler knows that the type is alloc::boxed::Box<i32> , and will drop the corresponding drop function with Box

Looking at this example alone, Box is not mysterious, it corresponds to the assembly implementation and is no different from a normal pointer, all constraints are compile-time behavior.

Ownership

1
2
3
4
5
fn main() {
    let x = Box::new(String::from("Rust"));
    let y = *x;
    println!("x is {}", x);
}

This example boxes a string, which is not necessary because String is, broadly speaking, a smart pointer. This example will report an error.

1
2
3
4
3 |     let y = *x;
  |             -- value moved here
4 |     println!("x is {}", x);
  |                         ^ value borrowed here after move

After *x is dereferenced, it corresponds to String, and when it is assigned to y, it executes move semantics, and the ownership is gone, so the subsequent println cannot print x.

1
let y = &*x;

You can take an immutable reference to a string to fix it.

Underlying implementation

1
2
3
4
pub struct Box<
    T: ?Sized,
    #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global,
>(Unique<T>, A);

Above is the definition of Box, which can be seen as a tuple structure with two generic parameters: T for an arbitrary type and A for a memory allocator. In the standard library A is the Gloal default value. Where T has a generic constraint ?Sized , indicating that the type size may or may not be known at compile time.

1
2
3
4
5
6
#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<#[may_dangle] T: ?Sized, A: Allocator> Drop for Box<T, A> {
    fn drop(&mut self) {
        // FIXME: Do nothing, drop is currently performed by compiler.
    }
}

This is the Drop implementation, as stated in the source code, implemented by the compiler.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#[stable(feature = "rust1", since = "1.0.0")]
impl<T: ?Sized, A: Allocator> Deref for Box<T, A> {
    type Target = T;

    fn deref(&self) -> &T {
        &**self
    }
}

#[stable(feature = "rust1", since = "1.0.0")]
impl<T: ?Sized, A: Allocator> DerefMut for Box<T, A> {
    fn deref_mut(&mut self) -> &mut T {
        &mut **self
    }
}

implements Deref to define dereference behavior, and DerefMut to mutable dereferences. So *x corresponds to the operation *(x.deref()).

Applicable scenarios

The official website mentions the following three scenarios, essentially Box is not very different from a normal pointer, so it is not as useful as Rc , Arc , RefCell.

  • When the type does not know the size at compile time, but the code scenario also requires confirming the type size.
  • When you have a lot of data and need to move ownership, and don’t want to copy the data.
  • trait objects, or dyn dynamic distribution is commonly used to store different types in a collection, or parameters specify different types. s The official website mentions an implementation of a linked table.
1
2
3
4
enum List {
    Cons(i32, List),
    Nil,
}

The above code does not work, and the reason is simple: it is a recursive definition. It doesn’t work with c code either, we usually have to define next type as a pointer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
enum List {
    Cons(i32, Box<List>),
    Nil,
}

use crate::List::{Cons, Nil};

fn main() {
    let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}

The solution given by the official website is to turn next into a pointer Box<List> , which is common sense, nothing to say.