There are two types of strings in Rust, String and &str, where String can be dynamically allocated, modified, and the internal implementation can be understood as Vec<u8>, and &str is a slice of type &[u8]. Both of these strings can only hold legal UTF-8 characters.

For non-naturally recognizable UTF-8 characters, consider using the following types.

  • File paths have dedicated Path and PathBuf classes available.
  • Use Vec<u8> and &[u8]
  • Use OSString and &OSStr to interact with the operating system
  • Use CString and &CStr to interact with C libraries

The second method above is the common way to handle non-UTF-8 byte streams, which is to use Vec<u8> and &[u8], where we can also use literal values for both types of data, which we call byte string literals of type &[u8].

String literals

Let’s look at string literals.

Like any other language, a string is enclosed in double quotes, but one of the features of Rust is that strings can span lines, i.e. a carriage return in the middle will not cause a compile or runtime error, and the output will carry the newline character inside.

Similarly, string literals support escapes, for example, if you want to use double quotes inside them, the escape will also escape line breaks, for example, if you use \ in front of a line break, the escape, the line break, and all spaces at the beginning of the next line will be ignored.

1
2
3
4
5
let a = "foobar";
let b = "foo\
         bar";

assert_eq!(a,b);

String literals support escaping to Unicode in addition to the common \ for bytes (characters).

  • \xHH: + 2 bits of hexadecimal 7-bit wide byte code, which is equivalent to the equivalent ASCII character.
  • \u{xxxx}: 24-bit-long hexadecimal, which represents the equivalent Unicode character.
  • \n/\r/\t denotes U+000A (LF), U+000D (CR) and U+0009 (HT)
  • \\\ is used to escape \\ itself
  • \0 denotes Unicode U+0000 (NUL)

Raw type string literals are escaped, meaning that the value of the string is whatever the literal value says. This type of literal is defined using r and a number of #s at the beginning and an equal number of #s at the end.

This is shown below.

1
2
3
4
5
6
7
8
"foo"; r"foo";                     // foo
"\"foo\""; r#""foo""#;             // "foo"

"foo #\"# bar";
r##"foo #"# bar"##;                // foo #"# bar

"\x52"; "R"; r"R";                 // R
"\\x52"; r"\x52";                  // \x52

What if there are double quotes in the string? Rust actually supports the use of r# to specify string bounds, since you can’t use escapes in raw strings. This # is another way to implement escaping, for example, if there are 4 #s in the string, then the string can be enclosed by r#####"abc####def "#####, which means that there are more #s than there are in it.

Byte string literals

Byte string literal values are defined using b"..." and its derivative syntax is defined as &[u8], which is a completely different type than &str, so some methods that work on &str won’t work on &[u8].

For example.

1
2
3
// &[u8; 5]: [119, 111, 114, 108, 100]!
let world = b"world";
println!("Hello, {}!", world);

The compiler will report an error because &[u8] does not implement std::fmt::Display.

1
2
3
4
5
6
7
29 |     println!("Hello, {}!", world);
   |                            ^^^^^ `[u8; 5]` cannot be formatted with the default formatter
   |

   = help: the trait `std::fmt::Display` is not implemented for `[u8; 5]`
   = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
   = note: this error originates in the macro `$crate::format_args_nl` (in Nightly builds, run with -Z macro-backtrace for more info)

The Byte string literal also supports escaping, but note that it only supports byte escaping, not Unicode escaping.

1
2
3
4
5
6
// Supports character escaping, output: Hello, Rust!
let escaped = b"\x52\x75\x73\x74 as bytes";

// Unicode escaping not supported, compile error.
// = help: unicode escape sequences cannot be used as a byte or in a byte string
let escaped = b"\u{211D} is not allowed";
1
2
3
4
5
6
7
8
// Raw byte strings work just like raw strings
let raw_bytestring = br"\u{211D} is not escaped here";
println!("{:?}", raw_bytestring);

// Converting a byte array to `str` can fail
if let Ok(my_str) = str::from_utf8(raw_bytestring) {
    println!("And the same as text: '{}'", my_str);
}

Byte strings also support raw definitions, similar to the standard string types, using the r prefix to define raw byte string literal variables.

For example, in the example below, a normal byte string needs to be escaped, but a raw byte string does not need to be escaped with \.

1
2
3
4
5
6
7
8
b"foo"; br"foo";                     // foo
b"\"foo\""; br#""foo""#;             // "foo"

b"foo #\"# bar";
br##"foo #"# bar"##;                 // foo #"# bar

b"\x52"; b"R"; br"R";                // R
b"\\x52"; br"\x52";                  // \x52

Summary

The following is a summary of these string literal definitions just introduced, listing the different ways of defining them and their meanings.

symbol meaning
"..." string literal
r"...", r#"..."#, r##"..."##, etc. Raw string literal value, no escaping
b"..." Byte string literal, type &[u8]
br"...", br#"... "#, br##"..."##, etc. Raw Byte string literal
'...' Character Literals
b'...' ASCII byte literal