std::string is probably one of the most common standard library data structures used by C++ programmers, how is std::string implemented internally after so long? What exactly is SSO (Small String Optimization)? I’ve recently become interested in the code of libc++, so I’ll use its implementation as an example to briefly answer the above questions. (Note that I have trimmed down the code shown a bit for ease of narration!)

Internal structure

My version of libc++ here is 37db2833. You can see that libcxx internally defines a macro (_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT) which is defined by a66a7b30, which simply means that it can control some features that cause ABI-changing, such as our internal layout here for std::string. The following code simply shows what this macro can do.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
    struct __long
    {
        pointer   __data_;
        size_type __size_;
        size_type __cap_ : sizeof(size_type) * CHAR_BIT - 1;
        size_type __is_long_ : 1;
    };
    enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?
                      (sizeof(__long) - 1)/sizeof(value_type) : 2};
    struct __short
    {
        value_type __data_[__min_cap];
        unsigned char __padding_[sizeof(value_type) - 1];
        unsigned char __size_ : 7;
        unsigned char __is_long_ : 1;
    };
#else
    // Attribute 'packed' is used to keep the layout compatible with the
    // previous definition that did not use bit fields. This is because on
    // some platforms bit fields have a default size rather than the actual
    // size used, e.g., it is 4 bytes on AIX. See D128285 for details.
    struct __long
    {
        struct _LIBCPP_PACKED {
            size_type __is_long_ : 1;
            size_type __cap_ : sizeof(size_type) * CHAR_BIT - 1;
        };
        size_type __size_;
        pointer   __data_;
    };
    enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?
                      (sizeof(__long) - 1)/sizeof(value_type) : 2};
    struct __short
    {
        struct _LIBCPP_PACKED {
            unsigned char __is_long_ : 1;
            unsigned char __size_ : 7;
        };
        char __padding_[sizeof(value_type) - 1];
        value_type __data_[__min_cap];
    };
#endif // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

You can see that whichever layout is selected, we have defined two structures. The structure __long is responsible for storing long strings and maintains a pointer to the heap __data_, which is responsible for storing the actual string, while the structure __short is responsible for storing short strings, also known as SSO. Since heap allocation is expensive, we store the shorter strings directly on the stack in an array of characters, in this case __data__[__min_cap]. GDB debugging shows that __min_cap has a size of 23, which means that characters smaller than 23 are inlined in std::string, and strings larger than that are are stored in the heap.

Next, the following code defines the total internal storage structure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
union __ulx{__long __lx; __short __lxx;};

enum {__n_words = sizeof(__ulx) / sizeof(size_type)};

struct __raw
{
    size_type __words[__n_words];
};

struct __rep
{
    union
    {
        __long  __l;
        __short __s;
        __raw   __r;
    };
};

__compressed_pair<__rep, allocator_type> __r_;

You can see that __r_ is the member that actually maintains the string, and it is of type _compressed_pair, which we can simply interpret as a std::pair. The first member of __r_ is __rep, and the second is allocator, the structure responsible for actually manipulating and allocating memory. There is only one union inside __rep, which may have the long string __long, the short string __short, or __raw. The __raw doesn’t seem to be useful, and I don’t know what it does.

String construction

Here is a simple code to see how string is constructed.

1
std::string str{};

It chooses the following constructor.

1
2
3
4
5
6
template <class _CharT, class _Traits, class _Allocator>
inline basic_string<_CharT, _Traits, _Allocator>::basic_string()
     : __r_(__default_init_tag(), __default_init_tag())
{
    __default_init();
}

where __default_init_tag is an empty structure defined in compressed_pair.h.

1
 struct __default_init_tag {}

The concrete implementation of __default__init is as follows.

1
2
3
4
5
6
void __default_init() {
  __zero();
  if (__libcpp_is_constant_evaluated()) {
   // ...
   }
}

The code in __libcpp_is_constant_evaluated() is related to the constexpr string, so we don’t need to look into it. And __zero() is the code that we actually initialize.

1
2
3
void __zero() {
   __r_.first() = __rep();
}

It’s really just the first member of __r_, __rep, initialized by default, no big deal.

Look at a more common and meaningful piece of code.

1
std::string str("Hello, world!");

By hooking up gdb for dynamic debugging we can see that it calls the following constructor.

1
2
3
4
template <class = __enable_if_t<__is_allocator<_Allocator>::value, nullptr_t> >
basic_string(const _CharT* __s) : __r_(__default_init_tag(), __default_init_tag()) {
   __init(__s, traits_type::length(__s));
}

The only real thing to notice about this code is the function call __init(__s, traits_type::length(__s));. And it is implemented as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
template <class _CharT, class _Traits, class _Allocator>
void
basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_type __sz)
{
    pointer __p;
    if (__fits_in_sso(__sz))
    {
        __set_short_size(__sz);
        __p = __get_short_pointer();
    }
    else
    {
        auto __allocation = std::__allocate_at_least(__alloc(), __recommend(__sz) + 1);
        __p = __allocation.ptr;
        __begin_lifetime(__p, __allocation.count);
        __set_long_pointer(__p);
        __set_long_cap(__allocation.count);
        __set_long_size(__sz);
    }
    traits_type::copy(std::__to_address(__p), __s, __sz);
    traits_type::assign(__p[__sz], value_type());
}

This is more important, so we’ll go through it line by line. First we determine if the length can fit into a short string (__fits_in_sso:__sz < __min_cap), and if so we call __set_short_size, which sets the relevant member of the structure __short.

1
2
3
4
void __set_short_size(size_type __s) {
  __r_.first().__s.__size_ = __s;
  __r_.first().__s.__is_long_ = false;
}

Finally, we get the pointer to the short string array __data_ mentioned earlier.

If the string is too long to be inlined in string, we need to do a dynamic allocation to store the string on the heap. Of particular note is the call auto __allocation = std::__allocate_at_least(__alloc(), __recommend(__sz) + 1);. It allocates space of size __recommend(__sz) + 1, which is responsible for storing the string (followed by a plus one because there is a NULL terminator at the end). And what about the __recommend call? Wouldn’t it be enough to allocate only __sz + 1 size space? It turns out it’s much more complicated than we thought.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
template <size_type __a> static
size_type __align_it(size_type __s)
{return (__s + (__a-1)) & ~(__a-1);}

enum {__alignment = 16};

static size_type __recommend(size_type __s) {
   if (__s < __min_cap) {
     return static_cast<size_type>(__min_cap) - 1;
    }
    size_type __guess = __align_it<sizeof(value_type) < __alignment ?
                        __alignment/sizeof(value_type) : 1 > (__s+1) - 1;
    if (__guess == __min_cap) ++__guess;
      return __guess;
}

This code is too complicated for me to write, but it just takes alignment into account anyway.

Finally, we copy our string into __p, which is a pointer to the actual stored string. Of course, don’t forget to set the last character to the NULL terminator.

Dynamic expansion

In the above discussion, we only considered the case where the string is either a short string or a long string, consider the following code.

1
2
std::string str("1234567890");
str.append("1234567890abcd");

When we try to append another string into std::string, what happens if the total length exceeds __min_cap (that is, 23)?

Again, with the help of dynamic debugging, we can see that we have entered this function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
template <class _CharT, class _Traits, class _Allocator>
basic_string<_CharT, _Traits, _Allocator>&
basic_string<_CharT, _Traits, _Allocator>::append(const value_type* __s, size_type __n)
{
    size_type __cap = capacity();
    size_type __sz = size();
    if (__cap - __sz >= __n)
    {
        if (__n)
        {
            value_type* __p = std::__to_address(__get_pointer());
            traits_type::copy(__p + __sz, __s, __n);
            __sz += __n;
            __set_size(__sz);
            traits_type::assign(__p[__sz], value_type());
        }
    }
    else
        __grow_by_and_replace(__cap, __sz + __n - __cap, __sz, __sz, 0, __n, __s);
    return *this;
}

As you can see, we call the function __grow_by_and_replace when we don’t have enough space left to store the string we want to add.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
template <class _CharT, class _Traits, class _Allocator>
void
basic_string<_CharT, _Traits, _Allocator>::__grow_by_and_replace
    (size_type __old_cap, size_type __delta_cap, size_type __old_sz,
     size_type __n_copy,  size_type __n_del,     size_type __n_add, const value_type* __p_new_stuff)
{
    size_type __ms = max_size();
    pointer __old_p = __get_pointer();
    size_type __cap = __old_cap < __ms / 2 - __alignment ?
                          __recommend(std::max(__old_cap + __delta_cap, 2 * __old_cap)) :
                          __ms - 1;
    auto __allocation = std::__allocate_at_least(__alloc(), __cap + 1);
    pointer __p = __allocation.ptr;
    __begin_lifetime(__p, __allocation.count);
    if (__n_copy != 0)
        traits_type::copy(std::__to_address(__p),
                          std::__to_address(__old_p), __n_copy);
    if (__n_add != 0)
        traits_type::copy(std::__to_address(__p) + __n_copy, __p_new_stuff, __n_add);
    size_type __sec_cp_sz = __old_sz - __n_del - __n_copy;
    if (__sec_cp_sz != 0)
        traits_type::copy(std::__to_address(__p) + __n_copy + __n_add,
                          std::__to_address(__old_p) + __n_copy + __n_del, __sec_cp_sz);
    if (__old_cap+1 != __min_cap || __libcpp_is_constant_evaluated())
        __alloc_traits::deallocate(__alloc(), __old_p, __old_cap+1);
    __set_long_pointer(__p);
    __set_long_cap(__allocation.count);
    __old_sz = __n_copy + __n_add + __sec_cp_sz;
    __set_long_size(__old_sz);
    traits_type::assign(__p[__old_sz], value_type());
}

This code is really complicated, I’ll translate it slightly simpler as follows.

  1. get the new __cap and allocate it by computing that __p is a pointer to the new heap memory.
  2. copy the first old string into the newly allocated __p.
  3. Put the second string that we want to append into after __p.
  4. If the size of our old string is larger than __min_cap, it is a long string and free the memory.
  5. Set the new relevant members in __long, such as the pointer to the stored string, its length and capacity.
  6. Finally, add the NULL terminator to the last bit of the string.

Summary

As you can see, the libcxx code is actually quite comfortable and easy to read, and I will probably write about interpreting the code of other standard library facilities later.

Finally, I’ll talk about how to compile C++ code under Linux using Clang and my own compiled libcxx.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Compiling libcxx
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo   \
      -DCMAKE_EXPORT_COMPILE_COMMANDS=YES \
      -DCMAKE_C_COMPILER=clang            \
      -DCMAKE_CXX_COMPILER=clang++        \
      -DLLVM_USE_LINKER=lld               \
      -S runtimes                         \
      -B build_libcxx                     \
      -G Ninja                            \
      -DLIBCXX_ENABLE_ASSERTIONS=ON       \
      -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" \
ninja -C build_libcxx
      

# Using libcxx
clang++ -nostdinc++ -nostdlib++                         \
          -isystem /path/to/build_libcxx/include/c++/v1 \
          -L /path/to/build_libcxx/lib                  \
          -Wl,-rpath, /path/to/build_libcxx/lib         \
          -lc++                                         \
          -std=c++20                                    \
          test.cc -g -O0