In everyday development StringBuilder must be used by everyone, even a lot. After all, we all know the unwritten norm that StringBuilder’s performance is higher than direct string splicing when a large number of strings need to be constructed at high frequencies, because using + or += directly will generate a new String instance, because String objects are immutable objects, which means that each time the string This means that each time the content of a string is manipulated, a new instance of the string is created, which is very unfriendly to the large number of scenarios where string splicing is performed. Hence StringBuilder was born. Note that this does not mean that StringBuilder can be used for all string stitching scenarios, but rather that it is used frequently for stitching the same string object. Today we will look at the clever implementation of StringBuilder in c# and experience the way the underlying class library solves the problem.

Note that immutability here means that the content of the string object itself is immutable, but the reference to the string variable can be changed.

Simple example

Let’s take a simple example of the operation, in fact, the core operation is mainly Append method and ToString method, and the constructor of StringBuilder from the point of view of the source code. The first is the most commonly used way to get the results of various Append directly and then finally.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
StringBuilder builder = new StringBuilder();
builder.Append("我和我的祖国");
builder.Append(',');
builder.Append("一刻也不能分割");
builder.Append('。');
builder.Append("无论我走到哪里,都留下一首赞歌。");
builder.Append("我歌唱每一座高山,我歌唱每一条河。");
builder.Append("袅袅炊烟,小小村落,路上一道辙。");
builder.Append("我永远紧依着你的心窝,你用你那母亲的脉搏,和我诉说。");
string result = builder.ToString();
Console.WriteLine(result);

StringBuilder also supports initializing some data through the constructor, and whether or not the initialization data is passed in the constructor means different initialization logic. For example, the following operations

1
2
3
StringBuilder builder = new StringBuilder("我和我的祖国");
//或者是指定StringBuilder的容量,这样的话StringBuilder初始可承载字符串的长度是16
builder = new StringBuilder(16);

Because StringBuilder is a basic class library, it is simple to look at, simple to use, and everyone uses these operations all the time.

Source Code Exploration

Above we simply demonstrate the use of the StringBuilder, general similar StringBuilder or List, although I did not use the process can not pay attention to the length of the container itself has been to add elements, in fact, the internal logic of the implementation of these containers themselves contain some expansion-related logic. We mentioned above that the core of StringBuilder is mainly three operations, that is, through these three functions can present the way StringBuilder works and the principle.

  • One is the Constructor, because the constructor contains some logic for initialization.
  • Next is the Append method, which is the core operation of StringBuilder for string splicing.
  • And finally, the ToString method, which converts the StringBuilder into a string, which is the operation we use to get the stitched string.

Let’s start with these three related methods to see the core implementation of StringBuilder, here I refer to the .net version v6.0.2.

Start with the construction

As we mentioned above, the constructor of StringBuilder represents the initialization logic, which is roughly the default constructor, i.e. the default initialization logic and a part of the custom constructor logic, mainly the logic that determines the length of the StringBuilder container that can hold the string.

Parameterless constructors

First, let’s take a look at the implementation of the default unparametric constructor [Click to view source code 👈]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
//可承载字符的最大容量,即可以拼接的字符串的长度
internal int m_MaxCapacity;
//承载【拼接字符串的char数组
internal char[] m_ChunkChars;
//默认的容量,即默认初始化m_ChunkChars的长度,也就是首次扩容触发的长度
internal const int DefaultCapacity = 16;
public StringBuilder()
{
    m_MaxCapacity = int.MaxValue;
    m_ChunkChars = new char[DefaultCapacity];
}

With the default parameterless constructor, we can learn two things

  • The first is that the core StringBuilder container for storing strings is a char[] array.
  • The default container for char[] character arrays is declared to be 16, i.e. the expansion mechanism is triggered if the first StringBuilder holds more than 16 characters.

Constructors with parameters

StringBuilder has several constructors with parameters, as follows

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
//声明初始化容量,即首次扩容触发的长度条件
public StringBuilder(int capacity)
//声明初始化容量,和最大容量即可以动态构建字符串的总长度
public StringBuilder(int capacity, int maxCapacity)
//用给定字符串初始化
public StringBuilder(string? value)
//用给定字符串初始化,并声明容量
public StringBuilder(string? value, int capacity)
//用一个字符串截取指定长度初始化,并声明最大容量
public StringBuilder(string? value, int startIndex, int length, int capacity)

Although there are many constructors, but most of them are calling their own overloaded methods, the core constructor with parameters is actually two, let’s look at them separately, the first is to specify the capacity of the initialization constructor [Click to view source code 👈]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
//可承载字符的最大容量,即可以拼接的字符串的长度
internal int m_MaxCapacity;
//承载【拼接字符串的char数组
internal char[] m_ChunkChars;
//默认的容量,即默认初始化m_ChunkChars的长度,也就是首次扩容触发的长度
internal const int DefaultCapacity = 16;
public StringBuilder(int capacity, int maxCapacity)
{
    //指定容量不能大于最大容量
    if (capacity > maxCapacity)
    {
        throw new ArgumentOutOfRangeException(nameof(capacity), SR.ArgumentOutOfRange_Capacity);
    }
    //最大容量不能小于1
    if (maxCapacity < 1)
    {
        throw new ArgumentOutOfRangeException(nameof(maxCapacity), SR.ArgumentOutOfRange_SmallMaxCapacity);
    }
    //初始化容量不能小于0
    if (capacity < 0)
    {
        throw new ArgumentOutOfRangeException(nameof(capacity), SR.Format(SR.ArgumentOutOfRange_MustBePositive, nameof(capacity)));
    }
    //如果指定容量等于0,则使用默认的容量
    if (capacity == 0)
    {
        capacity = Math.Min(DefaultCapacity, maxCapacity);
    }
    //最大容量赋值
    m_MaxCapacity = maxCapacity;
    //分配指定容量的数组
    m_ChunkChars = GC.AllocateUninitializedArray<char>(capacity);
}

The main thing is to judge and assign values to the maximum and initialized capacities, and if the initial and maximum capacities are formulated then the passed-in ones are the main ones. Next, let’s look at the main operation to initialize StringBuilder according to the specified string [Click to view source code 👈]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
//可承载字符的最大容量,即可以拼接的字符串的长度
internal int m_MaxCapacity;
//承载【拼接字符串的char数组
internal char[] m_ChunkChars;
//默认的容量,即默认初始化m_ChunkChars的长度,也就是首次扩容触发的长度
internal const int DefaultCapacity = 16;
//当前m_ChunkChars字符数组中已经使用的长度
internal int m_ChunkLength;
public StringBuilder(string? value, int startIndex, int length, int capacity)
{
    if (capacity < 0)
    {
        throw new ArgumentOutOfRangeException();
    }
    if (length < 0)
    {
        throw new ArgumentOutOfRangeException();
    }
    if (startIndex < 0)
    {
        throw new ArgumentOutOfRangeException();
    }
    //初始化的字符串可以为null,如果为null则只用空字符串即""
    if (value == null)
    {
        value = string.Empty;
    }
    //基础长度判断,这个逻辑其实已经包含了针对字符串截取的起始位置和接要截取的长度进行判断了
    if (startIndex > value.Length - length)
    {
        throw new ArgumentOutOfRangeException();
    }
    //最大容量是int的最大值,即2^31-1
    m_MaxCapacity = int.MaxValue;
    if (capacity == 0)
    {
        capacity = DefaultCapacity;
    }
    //虽然传递了默认容量,但是这里依然做了判断,在传递的默认容量和需要存储的字符串容量总取最大值
    capacity = Math.Max(capacity, length);
    //分配指定容量的数组
    m_ChunkChars = GC.AllocateUninitializedArray<char>(capacity);
    //这里记录了m_ChunkChars固定长度的快中已经被使用的长度
    m_ChunkLength = length;
    //把传递的字符串指定位置指定长度(即截取操作)copy到m_ChunkChars中
    value.AsSpan(startIndex, length).CopyTo(m_ChunkChars);
}

This initialization operation mainly intercepts the specified length of the given string and stores it in ChunkChars for initializing StringBuilder, where the initialized capacity depends on whether the length that can be intercepted is greater than the specified capacity, and the essence is to be able to store the intercepted length of the string.

Summary of Constructors

Through the logic of StringBuilder’s constructor we can see that StringBuilder is essentially stored in char[], the initialized length of this character array is 16, the main role of this length is the expansion mechanism, that is, the first time you need to expand the capacity is when the length of m_ChunkChars exceeds 16, at this time the original m_ChunkChars can no longer carry the string to be built when the expansion is triggered.

Core Methods

We have seen the initialization code of StringBuilder above, through the initialization operation, we can understand the data structure of StringBuilder itself, but if we want to understand the expansion mechanism of StringBuilder, we need to start from its Append method, because only when Append has the opportunity to judge the original The length of the m_ChunkChars array is sufficient to store the appended string. There are many overloads for the Append method of StringBuilder, so we won’t list them all here, but the essence is the same. So let’s choose the most familiar and commonly used Append(string? value) method to explain, directly find the source code location [Click to view source code 👈]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
//承载【拼接字符串的char数组
internal char[] m_ChunkChars;
//当前m_ChunkChars字符数组中已经使用的长度
internal int m_ChunkLength;
public StringBuilder Append(string? value)
{
    if (value != null)
    {
        // 获取当前存储块
        char[] chunkChars = m_ChunkChars;
        // 获取当前块已使用的长度
        int chunkLength = m_ChunkLength;
        // 获取传进来的字符的长度
        int valueLen = value.Length;

        //当前使用的长度 + 需要Append的长度 < 当前块的长度 则不需要扩容
        if (((uint)chunkLength + (uint)valueLen) < (uint)chunkChars.Length)
        {
            //判断传进来的字符串长度是否<=2
            //如果小于2则只用直接访问位置的方式操作
            if (valueLen <= 2)
            {
                //判断字符串长度>0的场景
                if (valueLen > 0)
                {
                    //m_ChunkChars的已使用长度其实就是可以Append新元素的起始位置
                    //直接取value得第0个元素放入m_ChunkChars[可存储的起始位置]
                    chunkChars[chunkLength] = value[0];
                }
                //其实是判断字符串长度==2的场景
                if (valueLen > 1)
                {
                    //因为上面已经取了value第0个元素放入了m_ChunkChars中
                    //现在则取value得第1个元素继续放入chunkLength的下一位置
                    chunkChars[chunkLength + 1] = value[1];
                }
            }
            else
            {
                //如果value的长度大于2则通过操作内存去追加value
                //获取m_ChunkChars的引用位置,偏移到m_ChunkLength的位置追加value
                Buffer.Memmove(
                    ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(chunkChars), chunkLength),
                    ref value.GetRawStringData(),
                    (nuint)valueLen);
            }
            //更新以使用长度的值,新的使用长度是当前已使用长度+追加进来的字符串长度
            m_ChunkLength = chunkLength + valueLen;
        }
        else
        {
            //走到这里说明进入了扩容逻辑
            AppendHelper(value);
        }
    }
    return this;
}

This part of the logic mainly shows the logic when the expansion condition is not reached, which is essentially appending the Append string to the m_ChunkChars array, where m_ChunkLength represents the length of the current m_ChunkChars already used, and another meaning is that it represents the starting position of the next Append element is stored to the starting position of m_ChunkLength. And the logic needed for expansion goes to the AppendHelper method, let’s look at the implementation of the AppendHelper method [Click to view source 👈]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
private void AppendHelper(string value)
{
    unsafe
    {
        //防止垃圾收集器重新定位value变量。
        //指针操作,string本身是不可变的char数组,所以它的指针是char* 
        fixed (char* valueChars = value)
        {
            //调用了另一个append
            Append(valueChars, value.Length);
        }
    }
}

Here we get the value pointer passed in and call another overloaded Append method, but we can get a message from this code that this operation is non-thread safe. We continue to find another Append method [click to view source code 👈]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
public unsafe StringBuilder Append(char* value, int valueCount)
{
    // value必须有值
    if (valueCount < 0)
    {
        throw new ArgumentOutOfRangeException();
    }

    //新的长度=StringBuilder的长度+需要追加的字符串长度
    int newLength = Length + valueCount;
    //新的长度不能大于最大容量
    if (newLength > m_MaxCapacity || newLength < valueCount)
    {
        throw new ArgumentOutOfRangeException();
    }

    // 新的起始位置=需要追加的长度+当前使用的长度
    int newIndex = valueCount + m_ChunkLength;
    // 判断当前m_ChunkChars的容量是否够用
    if (newIndex <= m_ChunkChars.Length)
    {
        //够用的话则直接将追加的元素添加到m_ChunkChars中去
        new ReadOnlySpan<char>(value, valueCount).CopyTo(m_ChunkChars.AsSpan(m_ChunkLength));
        //更新已使用的长度为新的长度
        m_ChunkLength = newIndex;
    }
    //当前m_ChunkChars不满足存储则需要扩容
    else
    {
        // 判断当前存储块m_ChunkChars还有多少未存储的位置
        int firstLength = m_ChunkChars.Length - m_ChunkLength;
        if (firstLength > 0)
        {
            //把需要追加的value中的前firstLength位字符copy到m_ChunkChars中剩余的位置
            //合理的利用存储空间,截取需要追加的value到m_ChunkChars剩余的位置
            new ReadOnlySpan<char>(value, firstLength).CopyTo(m_ChunkChars.AsSpan(m_ChunkLength));
            //更新已使用的位置,这个时候当前存块m_ChunkChars已经存储满了
            m_ChunkLength = m_ChunkChars.Length;
        }

        // 获取value中未放入到m_ChunkChars(因为当前块已经放满)剩余部分起始位置
        int restLength = valueCount - firstLength;
        //扩展当前存储块即扩容操作
        ExpandByABlock(restLength);
        //判断新的存储块是否创建成功
        Debug.Assert(m_ChunkLength == 0, "A new block was not created.");
        // 将value中未放入到m_ChunkChars的剩余部放入扩容后的m_ChunkChars中去
        new ReadOnlySpan<char>(value + firstLength, restLength).CopyTo(m_ChunkChars);
        // 更新当前已使用长度
        m_ChunkLength = restLength;
    }
    //一些针对当前StringBuilder的校验操作,和相关逻辑无关不做详细介绍
    //类似的Debug.Assert(m_ChunkOffset + m_ChunkChars.Length >= m_ChunkOffset, "The length of the string is greater than int.MaxValue.");
    AssertInvariants();
    return this;
}

The source code here deals with the length of a StringBuilder. Length represents the actual length of characters stored in the current StringBuilder object, which is defined as follows

1
2
3
4
5
6
7
8
9
public int Length
{
    //StringBuilder已存储的长度=块的偏移量+当前块使用的长度
    get => m_ChunkOffset + m_ChunkLength;
    set
    {
        //注意这里是有代码的只是我们暂时省略set逻辑
    }
}

The Append method in the source code above is actually another overloaded method, only Append(string? value) calls this logic, here you can clearly see that if the current storage block meets the storage, it is directly used. If the current storage location does not meet the storage, then the storage space will not be wasted, according to the available storage length of the current storage block to intercept the length of the string that needs to Append, into the remaining location of this storage block, the remaining characters that can not be stored will be stored in the expanded new storage block m_ChunkChars, this practice is to not waste storage space.

This is very well thought out, even if the expansion happens, then the storage block of my current node must be filled, ensuring the maximum use of storage space.

Through the Append source code above we can naturally see that the logic of expansion is naturally in the ExpandByABlock method [Click to view source code 👈]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
//当前StringBuilder实际存储的总长度
public int Length
{
    //StringBuilder已存储的长度=块的偏移量+当前块使用的长度
    get => m_ChunkOffset + m_ChunkLength;
    set
    {
        //注意这里是有代码的只是我们暂时省略set逻辑
    }
}
//当前StringBuilder的总容量
public int Capacity
{
    get => m_ChunkChars.Length + m_ChunkOffset;
    set
    {
        //注意这里是有代码的只是我们暂时省略set逻辑
    }
}

//可承载字符的最大容量,即可以拼接的字符串的长度
internal int m_MaxCapacity;
//承载【拼接字符串的char数组
internal char[] m_ChunkChars;
//当前块的最大长度
internal const int MaxChunkSize = 8000;
//当前m_ChunkChars字符数组中已经使用的长度
internal int m_ChunkLength;
//存储块的偏移量,用于计算总长度
internal int m_ChunkOffset;
//前一个存储块
internal StringBuilder? m_ChunkPrevious;
private void ExpandByABlock(int minBlockCharCount)
{
    //当前块m_ChunkChars存储满才进行扩容操作
    Debug.Assert(Capacity == Length, nameof(ExpandByABlock) + " should only be called when there is no space left.");
    //minBlockCharCount指的是剩下的需要存储的长度
    Debug.Assert(minBlockCharCount > 0);
    AssertInvariants();

    //StringBuilder的总长度不能大于StringBuilder的m_MaxCapacity
    if ((minBlockCharCount + Length) > m_MaxCapacity || minBlockCharCount + Length < minBlockCharCount)
    {
        throw new ArgumentOutOfRangeException();
    }

    //!!!需要扩容块的新长度=max(当前追加字符的剩余长度,min(当前StringBuilder长度,8000))
    int newBlockLength = Math.Max(minBlockCharCount, Math.Min(Length, MaxChunkSize));

    //判断长度是否越界
    if (m_ChunkOffset + m_ChunkLength + newBlockLength < newBlockLength)
    {
        throw new OutOfMemoryException();
    }

    // 申请一个新的存块长度为newBlockLength
    char[] chunkChars = GC.AllocateUninitializedArray<char>(newBlockLength);

    //!!!把当前StringBuilder中的存储块存放到一个新的StringBuilder实例中,当前实例的m_ChunkPrevious指向上一个StringBuilder
    //这里可以看出来扩容的本质是构建节点为StringBuilder的链表
    m_ChunkPrevious = new StringBuilder(this);
    //偏移量是每次扩容的时候去修改,它的长度就是记录了已使用块的长度,但是不包含当前StringBuilder的存储块
    //可以理解为偏移量=长度-已经存放扩容块的长度
    m_ChunkOffset += m_ChunkLength;
    //因为已经扩容了新的容器所以重置已使用长度
    m_ChunkLength = 0;
    //把新的块重新赋值给当前存储块m_ChunkChars数组
    m_ChunkChars = chunkChars;
    AssertInvariants();
}

This code is the core operation of expansion, through this we can clearly understand the storage nature of StringBuilder

  • First of all, the data of StringBuilder is stored in m_ChunkChars array, but the nature of expansion is a one way chain operation, StringBuilder itself contains m_ChunkPrevious which points to the data saved in the last expansion.
  • The actual length of the expansion is max(the remaining length of the current appended characters, min(the current StringBuilder length, 8000)), so we can know that the maximum size of a block of m_ChunkChars is 8000.

StringBuilder also contains a method to build an instance through StringBuilder. This constructor is used to build one way chains when expanding the capacity, and its implementation is also very simple

1
2
3
4
5
6
7
8
9
private StringBuilder(StringBuilder from)
{
    m_ChunkLength = from.m_ChunkLength;
    m_ChunkOffset = from.m_ChunkOffset;
    m_ChunkChars = from.m_ChunkChars;
    m_ChunkPrevious = from.m_ChunkPrevious;
    m_MaxCapacity = from.m_MaxCapacity;
    AssertInvariants();
}

The purpose is to pass the various data related to the storage before the expansion to the new StringBuilder instance. Well, so far the core logic of Append is finished, let’s roughly run through the core logic of Append Let’s roughly list it, for example

  1. default m_ChunkChars[16], m_ChunkOffset=0, m_ChunkPrevious=null, Length=0
  2. First expansion m_ChunkChars[16], m_ChunkOffset=16, m_ChunkPrevious=points to the original StringBuilder, m_ChunkLength=16
  3. Second expansion of m_ChunkChars[32], m_ChunkOffset=32, m_ChunkPrevious=StringBuilder of m_ChunkChars[16] before expansion, m_ChunkLength=32
  4. third expansion of m_ChunkChars[64], m_ChunkOffset=64, m_ChunkPrevious=StringBuilder of m_ChunkChars[64] before expansion, m_ChunkLength=64

I have tried to draw a diagram. I wonder if I can help understand the data structure of StringBuilder. The chain table structure of StringBuilder is the current node pointing to the last StringBuilder, that is, the current instance of StringBuilder before expansion

c# stringbuilder

c# StringBuilder overall data structure is a one-way chain table, but each node of the chain table storage block is m_ChunkChars is char[] . The essence of expansion is to add a new node to the chain table, and the capacity of the new node storage block will increase with each expansion. Most of the cases encountered when using it are 16 for the first time, 16 for the second time, 32 for the third time, 64 for the fourth time, and so on.

Convert to String

From the above data structure of StringBuilder, we understand that StringBuilder is essentially a unidirectional chain table, which contains m_ChunkPrevious pointing to the previous StringBuilder instance, i.e. a chain table in reverse order. We finally get the result of the construction of the StringBuilder through the ToString() method of the StringBuilder to get a final result string, next we will look at the implementation of ToString [Click to view source code 👈]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
//当前StringBuilder实际存储的总长度
public int Length
{
    //StringBuilder已存储的长度=块的偏移量+当前块使用的长度
    get => m_ChunkOffset + m_ChunkLength;
    set
    {
        //注意这里是有代码的只是我们暂时省略set逻辑
    }
}
public override string ToString()
{
    AssertInvariants();
    //当前StringBuilder长度为0则直接返回空字符串
    if (Length == 0)
    {
        return string.Empty;
    }
    //FastAllocateString函数负责分配长度为StringBuilder长度的字符串
    //这个字符串就是ToString最终返回的结果,所以长度等于StringBuilder的长度
    string result = string.FastAllocateString(Length);
    //当前StringBuilder是遍历的第一个链表节点
    StringBuilder? chunk = this;
    do
    {
        //当前使用长度必须大于0,也就是说当前块的m_ChunkChars必须使用过,才需要遍历当前节点
        if (chunk.m_ChunkLength > 0)
        {
            // 取出当前遍历的StringBuilder的相关数据
            // 当前遍历StringBuilder的m_ChunkChars
            char[] sourceArray = chunk.m_ChunkChars;
            int chunkOffset = chunk.m_ChunkOffset;
            int chunkLength = chunk.m_ChunkLength;

            // 检查是否越界
            if ((uint)(chunkLength + chunkOffset) > (uint)result.Length || (uint)chunkLength > (uint)sourceArray.Length)
            {
                throw new ArgumentOutOfRangeException();
            }
            //把当前遍历项StringBuilder的m_ChunkChars逐步添加到result中当前结果的前端
            Buffer.Memmove(
                ref Unsafe.Add(ref result.GetRawStringData(), chunkOffset),
                ref MemoryMarshal.GetArrayDataReference(sourceArray),
                (nuint)chunkLength);
        }
        //获取当前StringBuilder的前一个节点,循环遍历链表操作
        chunk = chunk.m_ChunkPrevious;
    }
    //如果m_ChunkPrevious==null则代表是第一个节点
    while (chunk != null);

    return result;
}

The essence of this ToString operation is an inverted-link traversal operation, each traversal gets the current StringBuilder’s m_ChunkPrevious character array and after the data stitching is completed, it gets the last StringBuilder node of the current StringBuilder, i.e. m_ ChunkPrevious, the end condition is m_ChunkPrevious==null which means the node is the first node, and finally stitching into a string is returned. For example, our StringBuilder stores ``I can’t be separated from my country, wherever I go, I leave a hymn. ‘, then the traversal process for the ToString traversal StringBuilder is roughly as follows

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
//初始化一个等于StringBuilder长度的字符串
string result = "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";
//第一次遍历后
result = "\0\0\0\0\0\0\0\0\0\0\0\0\0\0无论我走到哪里都留下一首赞歌。";
//第二次遍历后
result = "\0\0\0\0\0\0\0一刻也不能分割,无论我走到哪里都留下一首赞歌。";
//第三次遍历后
result = "\0\0\0我的祖国一刻也不能分割,无论我走到哪里都留下一首赞歌。";
//第三次遍历后
result = "我和我的祖国一刻也不能分割,无论我走到哪里都留下一首赞歌。";

After all, StringBuilder can only record the data of the last StringBuilder, so this is an operation that traverses the chain of StringBuilder in reverse order, each traversal is to add the data recorded in m_ChunkPrevious until m_ChunkPrevious==null then the traversal is completed and the result is returned directly.

c# StringBuilder class ToString is essentially a reverse-order traversal of a one-way chain, each node of the chain is a StringBuilder instance, get the storage block inside m_ChunkChars character array to assemble, loop through all the nodes after the results are assembled into a string to return.

Compare to java implementation

We can see that the implementation of StringBuilder on C# is essentially a chain table. So and C# language similar to the Java implementation of the idea of whether the same, let’s look at the general idea of the implementation of StringBuilder in Java how my local jdk version for 1.8.0_191, the first is also the initialization logic

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
//存储块也就是承载Append数据的容器
char[] value;
//StringBuilder的总长度
int count;
public StringBuilder() {
    //默认的容量也是16
    super(16);
}

public StringBuilder(String str) {
    //这个地方有差异如果通过指定字符串初始化StringBuilder
    //则初始化的长度则是当前传递的str的长度+16
    super(str.length() + 16);
    append(str);
}

// AbstractStringBuilder.java
AbstractStringBuilder(int capacity) {
    value = new char[capacity];
}

Here you can see that the logic of the initialization capacity of java is a little different from c#, the default initialization length of c# depends on the length of the initialization string that can be stored mainly, while the implementation of java is in the current length +16 length, that is, the length of this initialization 16 must be available anyway. So let’s look at the source code for the implementation of append again.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// AbstractStringBuilder.java
public AbstractStringBuilder append(String str) {
    if (str == null)
        return appendNull();
    int len = str.length();
    // 这里是扩容操作
    ensureCapacityInternal(count + len);
    str.getChars(0, len, value, count);
    //每次append之后重新设置长度
    count += len;
    return this;
} 

The core is the method to expand ensureCapacityInternal, let’s simply look at its implementation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
private void ensureCapacityInternal(int minimumCapacity) {
    //当前需要的长度>char[]的长度则需要扩容
    if (minimumCapacity - value.length > 0)
        expandCapacity(minimumCapacity);
}

void expandCapacity(int minimumCapacity) {
    //新扩容的长度是当前块char[]的长度的2倍+2
    int newCapacity = value.length * 2 + 2;
    if (newCapacity - minimumCapacity < 0)
        newCapacity = minimumCapacity;
    if (newCapacity < 0) {
        if (minimumCapacity < 0)
            throw new OutOfMemoryError();
        newCapacity = Integer.MAX_VALUE;
    }
    //把当前的char[]复制到新扩容的字符数组中
    value = Arrays.copyOf(value, newCapacity);
}

// Arrays.java copy的逻辑
public static char[] copyOf(char[] original, int newLength) {
    //声明一个新的数组,把original的数据copy到新的char数组中
    char[] copy = new char[newLength];
    System.arraycopy(original, 0, copy, 0,
                     Math.min(original.length, newLength));
    return copy;
}

The last thing to show is the operation to get the StringBuilder result, again the toString method, let’s look at the implementation of this logic in java.

1
2
3
4
5
@Override
public String toString() {
    // 这里创建了一个新的String对象返回,通过当前char[]初始化这个字符串
    return new String(value, 0, count);
}

By now the logic of StringBuilder in java is very clear to all of us.

  • The real data of StringBuilder in c# is stored in m_ChunkChars character array, but the overall data structure is one way chain table, while in java it is completely char[] character array.
  • The initial length of StringBuilder in c# is the length that can hold the current initialized string, while the initialized length of java is the length of the current passed string + 16.
  • The expansion of StringBuilder in c# is to generate a new instance of StringBuilder with a capacity related to the length of the previous StringBuilder. java generates a new array of the length of the original char[] array * 2 + 2 length.
  • The ToString implementation in c# iterates through the inverted chain to assemble a new string to return, while java initializes a new string with the char[] of the current StringBuilder to return.

About the c# and java StringBuilder implementation is so different, in the end which implementation is better? There is no way to evaluate this, after all, each language’s underlying class library implementation is well thought out and integrates many people’s ideas. From the owner’s point of view, the core function of StringBuilder itself lies in the construction process, so the performance of the construction process is very important, so the logic of similar array expansion and then copy is not as efficient as the way of a chain table. However, when it comes to the final ToString result, the advantage of the array is very obvious, after all, string is essentially a char[] array.

For StringBuilder append is a frequent operation and most of the cases may be append operation many times, while ToString operation for StringBuilder is basically only once, that is when you get the result of StringBuilder construction. So the owner feels that improving the performance of append is the key.

Summary

In this article we have explained the general implementation of c# StringBuilder, and also compared the differences between c# and java regarding the implementation of StringBuilder, the main difference is that the underlying data structure implemented in c# is a unidirectional chain table, but the data of each node is stored in char[], while the overall implementation of java is an array. is an array. This also provides us with a different way of thinking, and here we also summarize its implementation again.

  • c# StringBuilder is essentially a one way chain operation, StringBuilder itself contains m_ChunkPrevious pointing to the data saved in the last expansion, the essence of the expansion is to add a new node to the chain.
  • The actual length of the expansion is max(the remaining length of the current appended character, min(the current StringBuilder length, 8000)), and the capacity of the new node storage block will be increased each time the expansion is done. Most of the cases encountered when using is the first time for 16, the second time for 16, three times for 32, four times for 64 and so on.
  • c# StringBuilder class ToString is the essence of inverted traversal of the one-way chain table, each traversal to obtain the current StringBuilder m_ChunkPrevious array of characters after the completion of data stitching, and then get m_ChunkPrevious pointed to the last StringBuilder instance, and finally the results are assembled into a string to return.
  • The main difference between the c# and java implementations of StringBuilder is that the overall underlying data structure of the c# implementation is a unidirectional chain table, but the data itself is stored in char[] in each StringBuilder instance, which is a bit like the redis quicklist. The overall approach is a char[] array of characters.