Exploring String Implementation Through Redis Source Code

Overview

I’ve been learning C through Redis recently, and I have to say that Redis code is really neatly written. This article is a comprehensive and in-depth explanation of the source code implementation of the Redis data structure, so I hope you can learn something from it.

The source code for Redis strings is placed in two files, sds.c and sds.h. The implementation has been stripped out into a separate library: https://github.com/antirez/sds.

The dynamic string structure of Redis is shown in the following figure.

redis string

SDS consists of two parts: header and data segment. header also contains three fields: len, alloc, and flags. len indicates the length of data. alloc indicates the length of allocated memory, and flags indicates the data type of sds.

In previous versions, the header of sds actually occupied a fixed size of 8 bytes of memory, so if all the strings stored in redis were small, the header of sds would take up a lot of memory space.

However, as the version of sds has changed, there have been some optimizations in memory usage.

before sds 2.0 the header size was fixed int type, after 2.0 the header len and alloc types are adjusted according to the incoming character size to save memory.
The header structure is modified with __attribute__ to prevent the compiler from automatically aligning the memory, which reduces the amount of memory used by the compiler due to the number of padding caused by memory alignment;

There are five types of sds header defined in the current version, of which sdshdr5 is useless, so it is not drawn.

sds header

Source code analysis

Creation of sds

The sds is created with the following functions.

sds sdsnewlen(const void *init, size_t initlen);
sds sdsnew(const char *init);
sds sdsempty(void);
sds sdsdup(const sds s);

sdsnewlen passes in a string of C to create a string of sds and needs to pass in the size; * sdsnewlen passes in a string of C to create a string of sds and needs to pass in the size.
sdsnew passes in a string of C to create a string of sds, which actually calls strlen(init) inside to calculate the length and then calls sdsnewlen.
sdsempty will create an empty string “”.
sdsdup calls sdsnewlen, which copies an already existing string.

So you can tell by the creation above that eventually sdsnewlen will be called to create the string, so let’s see exactly how this function is implemented.

sdsnewlen

In fact, for the creation of a string object, there are actually three things done: requesting memory, constructing the structure, and copying the string into the string memory area. Let’s look at the specific implementation of Redis.

Request Memory

sds sdsnewlen(const void *init, size_t initlen) {
    void *sh; //指向SDS结构体的指针
    sds s; //sds类型变量，即char*字符数组
    char type = sdsReqType(initlen); //根据数据大小获取sds header 类型
    if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
    int hdrlen = sdsHdrSize(type); // 根据类型获取sds header大小
    unsigned char *fp; /* flags pointer. */

    assert(hdrlen+initlen+1 > initlen); /* Catch size_t overflow */
    sh = s_malloc(hdrlen+initlen+1); //新建SDS结构，并分配内存空间，这里加1是因为需要在最后加上\0
    ...
    return s;
}

Before the memory request, there are several things that need to be done.

since sds will design the type of header based on the size passed in, the sdsReqType function needs to be called to get the header type based on initlen.
then call sdsHdrSize to get the number of bytes occupied by the header according to the header type.
finally call s_malloc to allocate memory according to header length and string length, here we need to add 1 because in c, the string ends with \0, in order to be compatible with c’s string format.

Now that we are talking about header type, let’s look at the type definition of header.

struct __attribute__ ((__packed__)) sdshdr8 { // 占用 3 byte
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 { // 占用 5 byte
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 { // 占用 9 byte
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 { // 占用 17 byte
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

Here __attribute__ is used to prevent memory alignment, which would otherwise also waste some storage space.

After reading the above definition, it is natural to assume that Redis will determine the type of sds header generated based on the size passed in.

static inline char sdsReqType(size_t string_size) {
    if (string_size < 1<<5) // 小于 32
        return SDS_TYPE_5;
    if (string_size < 1<<8) // 小于 256
        return SDS_TYPE_8;
    if (string_size < 1<<16) // 小于 65,536
        return SDS_TYPE_16;
#if (LONG_MAX == LLONG_MAX)
    if (string_size < 1ll<<32) 
        return SDS_TYPE_32;
    return SDS_TYPE_64;
#else
    return SDS_TYPE_32;
#endif
}

You can see that sdsReqType determines the character type based on the length of the string passed in.

Constructing Structs

For those who have never used C, Redis is a bit of a hack in the way it constructs structures, first by requesting a block of memory based on the size of memory needed, then by initializing the location of the pointer to the header structure, and finally by setting the value of the pointer to the header structure.

#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));

sds sdsnewlen(const void *init, size_t initlen) {
    ...
    sh = s_malloc(hdrlen+initlen+1); // 1.申请内存，这里长度加1是为了在最后面存放一个 \0
    if (sh == NULL) return NULL;
    if (init==SDS_NOINIT)
        init = NULL;
    else if (!init)
        memset(sh, 0, hdrlen+initlen+1);// 2.将内存的值都设置为0
    s = (char*)sh+hdrlen;               //3.将s指针指向数据起始位置
    fp = ((unsigned char*)s)-1;         //4.将fp指针指向sds header的flags字段
    switch(type) {
        case SDS_TYPE_5: {
            *fp = type | (initlen << SDS_TYPE_BITS);
            break;
        } 
        case SDS_TYPE_8: {
            SDS_HDR_VAR(8,s); //5.构造sds结构体header
            sh->len = initlen; // 初始化 header len字段
            sh->alloc = initlen; // 初始化 header alloc字段
            *fp = type; // 初始化 header flag字段
            break;
        }
        ...
    }
    ...
    return s;
}

I have marked the above process clearly, it may be difficult to understand the process of constructing the header structure here by looking at the code directly, I will use a diagram below to show the location of the pointer.

header

String Copy

sds sdsnewlen(const void *init, size_t initlen) {
    ...
    if (initlen && init)  
        memcpy(s, init, initlen); //将要传入的字符串拷贝给sds变量s
    s[initlen] = '\0'; //变量s末尾增加\0，表示字符串结束
    return s;
}

The memcpy function will copy the string byte by byte into the memory area corresponding to s.

sdscatlen string append

sds sdscatlen(sds s, const void *t, size_t len) {
    size_t curlen = sdslen(s); // 获取字符串 len 大小
    //根据要追加的长度len和目标字符串s的现有长度，判断是否要增加新的空间
    //返回的还是字符串起始内存地址
    s = sdsMakeRoomFor(s,len);
    if (s == NULL) return NULL;
    // 将新追加的字符串拷贝到末尾
    memcpy(s+curlen, t, len);
    // 重新设置字符串长度
    sdssetlen(s, curlen+len);
    s[curlen+len] = '\0';
    return s;
}

In this string appending method, the space check and expansion are actually encapsulated in the sdsMakeRoomFor function, which has the following logic.

if there is space left, return it directly if there is space left.
if there is no space left, then it needs to be expanded and by how much?
whether the string can be appended to the original location, or whether a new memory area needs to be requested.

So I will divide the sdsMakeRoomFor function into two parts: expansion and memory request.

expansion

sds sdsMakeRoomFor(sds s, size_t addlen) {
    void *sh, *newsh;
    size_t avail = sdsavail(s); //这里是用 alloc-len,表示可用资源
    size_t len, newlen;
    char type, oldtype = s[-1] & SDS_TYPE_MASK;
    int hdrlen;

    if (avail >= addlen) return s; // 如果有空间剩余，那么直接返回

    len = sdslen(s); // 获取字符串 len 长度
    sh = (char*)s-sdsHdrSize(oldtype); //获取到header的指针
    newlen = (len+addlen); // 新的内存空间
    if (newlen < SDS_MAX_PREALLOC) //如果小于 1m， 那么存储空间直接翻倍
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC; //超过了1m，那么只会多增加1m空间
    ...
    return s;
}

For scaling, the first step is to see if there is enough space, which is determined by alloc-len, and return directly if there is space left.

Then Redis will expand the capacity depending on the size of the sds. If len+addlen is less than 1m, then the new space is doubled; if len+addlen is greater than 1m, then the new space is only increased by 1m.

Redis expand

memory request

sds sdsMakeRoomFor(sds s, size_t addlen) {
    ...

    type = sdsReqType(newlen); // 根据新的空间占用计算 sds 类型 

    hdrlen = sdsHdrSize(type); // header 长度
    if (oldtype==type) { // 和原来header一样，那么可以复用原来的空间
        newsh = s_realloc(sh, hdrlen+newlen+1); // 申请一块内存，并追加大小
        if (newsh == NULL) return NULL;
        s = (char*)newsh+hdrlen;
    } else { 
        //如果header 类型变了，表示内存头变了，那么需要重新申请内存
        //因为如果使用s_realloc只会向后追加内存
        newsh = s_malloc(hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        memcpy((char*)newsh+hdrlen, s, len+1);
        s_free(sh); // 释放掉原内存
        s = (char*)newsh+hdrlen;
        s[-1] = type;
        sdssetlen(s, len);
    }
    sdssetalloc(s, newlen);//重新设置alloc字段
    return s;
}

In terms of memory requests, Redis has two cases: one is that the sds header type has not changed, so you can just call realloc to append a new memory area behind the original memory.

The other case is when the sds header type has changed, which generally means that the header is taking up more space, because realloc cannot append the memory area forward, so you can only call malloc to reapply a memory area and then copy the string to the new address via memcpy.

Summary

In this article, we learned a lot about how Redis implements strings, and what changes have been made through versioning. You can compare sds with your own familiar language’s string implementation to see what the differences are and which is better.

Table of Contents