Redis' AOF persistence principle

Redis provides two separate persistence mechanisms, RDB and AOF. This chapter first describes how the AOF feature works, how commands are stored in AOF files, and how different AOF storage modes affect data security and Redis performance. After that, we will introduce the method of restoring the database state from the AOF file and the implementation mechanism behind the method. Some pseudo-code will also be used to facilitate understanding. This article is based on the redis design and implementation book. The knowledge about redis persistence is more important, so I read the book directly to avoid detours, and record it in this article.

Basic Introduction

AOF persistence is used to record database state by saving the write commands executed by the redis server.

1
2
3

set key1 value1
sadd fruits "apple" "banner"
rpush numbers 128 125

The RDB persistence method is to save the key-value pairs of key1, fruits, and numbers to the RDB file, while the AOF persistence method is to save the set, sadd, and rpush commands executed by the server to the AOF file, and all commands written to the AOF file are saved in the Redis command request protocol format.

Persistence Implementation

The implementation of the AOF persistence function can be divided into three steps (sync): append, write, and synchronize.

Command append

When AOF persistence is on, the server appends the executed write command to the end of the aof_buf buffer of the server state in protocol format after executing a write command.

Writing and Synchronization

The server process in Redis is an event loop, where file events are responsible for receiving command requests from clients and sending command replies to clients, and time events are responsible for executing functions like the serverCron function that need to be run at regular intervals. Because the server may execute commands while processing file events, causing some content to be appended to the aof_buf buffer, before the server ends an event loop, it calls the flushAppendOnlyFile function to consider whether the contents of the aof_buf buffer need to be written and saved to the AOF file The following is the pseudo-code.

#事件轮询函数
def evenloop():
　　while True:
       # 处理文件事件，接收命令请求以及发送命令回复
       # 处理命令请求时可能会有新的内容被追加到 aof_buf 缓存区中
       processFileEvents()

       # 处理时间事件
       processTimeEvents()

　　　　# 是否将 aof_buf 缓冲区中的内容写入并同步到 appendonly.aof 文件中。
       flushAppendOnlyFile()

The behavior of the flushAppendOnlyFile function is determined by the value of the appendfsync option configured by the server, and the different values produce the following behavior.

Options	Behaviors
always	Write and synchronize all contents of the aof_buf buffer to the AOF file (safest, but poor performance)
everysec	Writes and synchronizes all the contents of the aof_buf buffer to the AOF file, and then synchronizes the AOF file again if the last time the AOF file was synchronized was more than 1 second ago. (safe, better performance)
no	Writes and synchronizes all the contents of the aof_buf buffer to the AOF file, but does not synchronize the AOF file; it is generally up to the operating system to decide when to synchronize. (typically 30 seconds, unsafe, best performance)）

If the user does not actively set a value for the appendfsync option, then the appendfsync option defaults to everysec. For more information on the appendfsync option, see the sample configuration file redis.conf that comes with the Redis project.

To improve the efficiency of writing to files, in modern operating systems, when the user calls the write function to write some data to a file, the operating system usually stores the written data in a memory buffer temporarily and waits until the buffer is full or the specified time limit has passed before actually writing the data in the buffer to disk. This approach improves efficiency, but also creates a security problem for writing data, because if the computer is down, the write data stored in the memory buffer will be lost. For this reason, two synchronization functions, fsync and fdatasynce, are provided to force the operating system to write the data in the buffer to the hard disk immediately, thus ensuring the security of the written data.

AOF persistence and efficiency

The value of the server configuration appendfsync option directly determines the efficiency and security of the AOF persistence feature.

When the value of appendfsync is always, the server writes all the contents of the aof_buf buffer to the AOF file in each event loop and synchronizes the AOF file, so always is the slowest of the three values of the appendfsync option, but in terms of security, always is also the safest. always is also the safest in terms of security, because even if there is a crash, AOF persistence will only lose the command data generated in an event loop.
When the value of appendfsync is everysec, the server writes all the contents of the aof_buf buffer to the AOF file in each event loop, and synchronizes the AOF file in a subthread every second. In terms of efficiency, everysec mode is fast enough, and even if there is a downtime, the database will only lose one second of command data.
When the value of appendfsync is no, the server writes all the contents of the aof_buf buffer to the AOF file in each event loop, and it is up to the operating system to control when the AOF file is synchronized. Because the flushAppendonlyFile call in no mode does not require a synchronization operation, the AOF file writes in this mode are always the fastest, but because this mode accumulates write data in the system cache for a period of time, the single synchronization time in this mode is usually the longest of the three modes. From the standpoint of spreading operations, the no mode is similar in efficiency to the everysec mode, in that in the event of a downtime, a server using the no mode will lose all write command data since the last synchronization of the AOF file.

AOF file loading and data restoration

Since the AOF file contains all the write commands needed to rebuild the database state, the server simply reads and re-executes the write commands stored in the AOF file to restore the database state before the server is shut down. The detailed steps for Redis to read the AOF file and restore the database state are as follows:

Create a pseudo-client without a network connection (fakeclient): Because Redis commands can only be executed in the client context, and the commands used to load the AOF file come directly from the AOF file and not from a network connection, the server uses a pseudo-client without a network connection to execute the write commands saved in the AOF file. The effect of a pseudo-client executing a command is exactly the same as a client with a network connection executing a command.
Parses and reads a write command from the AOF file.
Execute the read write command using the pseudo-client.
Keep performing steps 2 and 3 until all write commands in the AOF file have been processed.

When the above steps are completed, the database state stored in the AOF file is restored in its entirety.

AOF rewriting

Because AOF persistence records database state by storing executed write commands, as server runtime passes, the contents of the AOF file will become larger and larger, and if left unchecked, an oversized AOF file is likely to have an impact on the Redi server or even the entire host computer. The larger the AOF file, the more time it takes to perform a data restore using the AOF file.

As an example

redis> RPUSH list "A" "B"  // ["A","B"]
(integer) 2

redis> RPUSH list "C"       // ["A","B", "C"]
(integer) 3

redis> RPUSH list "D" "E" // ["A","B", "C", "D", "E"]
(integer) 5

redis> LPOP list // ["B", "C", "D", "E"]
"A"

redis> LPOP list // ["C", "D", "E"]
"B"

redis> RPUSH list "F" "G" // ["C", "D", "E", "F" "G"]
(integer) 5

Then the AOF file would need to store six commands just to record the state of the list key. For real-world applications, the number and frequency of write commands will be much higher than the simple example above, and will cause much more serious problems. To solve the problem of AOF file size expansion, Redis provides the AOF file rewrite feature. With this feature, the Redis server can create a new AOF file to replace the existing AOF file. The old and new AOF files hold the same repository state, but the new AOF file does not contain any redundant commands that waste space, so the size of the new AOF file is usually much smaller than the old AOF file.

In the next section, we will describe how AOF file rewriting is implemented, and how the BGREWRITEAOF command is implemented.

Implementation of AOF file rewriting

Although Redis names the ability to generate new AOF files to replace AOF files with AOF file rewrites, in reality, AOF file rewrites do not require any reading, parsing, or writing of existing AOF files; they are implemented by reading the current database state of the server.

Consider a situation where the server executes the following command on the list key.

redis> RPUSH list "A" "B"  // ["A","B"]
(integer) 2

redis> RPUSH list "C"       // ["A","B", "C"]
(integer) 3

redis> RPUSH list "D" "E" // ["A","B", "C", "D", "E"]
(integer) 5

redis> LPOP list // ["B", "C", "D", "E"]
"A"

redis> LPOP list // ["C", "D", "E"]
"B"

redis> RPUSH list "F" "G" // ["C", "D", "E", "F" "G"]
(integer) 5

Then the server must write six commands in the AOF file in order to keep the current state of the list keys. If the server wants to use as few commands as possible to record the state of the list key, then the easiest and most efficient way is not to read and analyze the contents of the existing AOF file, but to read the value of the key list directly from the database and then replace the six commands saved in the AOF file with a single RPUSH list “C” “D” “E” “E” “G” command, thus reducing the This reduces the number of commands needed to save the list key from six to one.

The pseudo-code of the whole process can be represented as follows.

def aof_rewrite(new_aof_file_name):
    #创建新AOF文件
    f = create_file(new_aof_file_name)

    #当遍历疑据库
    for db in redisserver.db:
        #忽略空数据库
        if db.is_empty:continue

        #写入 SELECT 命令，指定数据库号码
        f.writecommand("SELECT"＋ db.id)
        
        #遍历最据库中的所有键
        for key in db:
            #忽略已过期的健
            if key.is_expired(): continue
            #根据键的痰型对键进行重写
            if key.type == String:
                rewrite_string(key)
            elif key.type == List:
                rewrite_list(key)
            elif key.type == Hash:
                rewrite_hash(key)
            elif key.type == Set:
                rewrite_set(key)
            elif key.type == SortedSet:
                rewrite_sorted_set(key)
            
            # 如果键带有过翔时闻，那么过期时锏也要敲重写
            if key.have_expire_time():
                rewrite_expire_time(key)
    #写入完毕，关闭文件
    f.close()

def rewrite_string(key):
    #使用Get命令获取字符串键的值
    value=Get(key)

    #使用SET命令重写字符串键
    f.write_command(SET, key, value)

def rewrite_list(key):
    #使用LRANGE命令获取所有元素
    item1,item2, ... , itemN = LRANGE(key, 0, 1)

    #使用RPUSH命令重写列表
    f.write_command(RPUSH, key, item1,item2.....)

def rewrite_hash(key):
    #使用HGETALL命令获取哈希所有键值对
    field1, value1, field2, value2,...,fieldN,valueN = HGETALL(key)

    #使用HMSET命令重写字符串键
    f.write_command(HMSET, key, field1, value1, field2, value2,...,fieldN,valueN)

def rewrite_set(key):
    #使用 SMEMBERS 命令获取集合键包含的所有元素
    elem1, elem2, ..., elemN = SMERBERS(key)

    #使用 SADD 命令重写集合
    f.write_command(SADD, key, elem1, elem2, ..., elemN)

def rewrite_sorted_set(key):
    #使用 ZRANGE 命令获取有序集合键包含的所有元素
    member1, score1, member2, score2, ..., memberN, scoreN = ZRANGE(key, 0, -1, "WITHSCORES")

    #使用 ZADD 命令重写有序集合
    f.write_command(ZADD, key, member1, score1, member2, score2, ..., memberN, scoreN)

def rewrite_expire_time(key):
    #获取毫秒精度的键过期时间
    timestamp = get_expire_time_in_unixstamp(key)

    #使用 PEXPIREAT 命令重写过期时间
    f.write_command(PEXPIREAT, key, timestamp)

Because the new AOF file generated by the aof_rewrite function contains only the commands necessary to restore the current database state, the new AOF file does not waste any hard disk space.

Note: In practice, to avoid overflowing the client input buffer when executing commands, the rewriter will first check the number of elements contained in the key if the number of elements exceeds the value of the redis.h/REDIS_AOF_REWRITE_ITEMS_PER_ CMD constants, then the rewriter will use multiple commands to record the value of the key instead of just one command. In version 3.0, the value of the REDIS_AOF_REWRITE_ITEMS_PER_CMD constant is 64, which means that if a collection key contains more than 64 elements, then the rewriter will use multiple SADD commands to record the collection, and each command sets the number of elements to 64 as well.

AOF background rewrites

The AOF rewriter aof_rewrite function described above does a good job of building a new AOF file, but because this function does a lot of writing, the thread calling it will be blocked for a long time. Because the Redis server uses a single thread to handle command requests, if the aof_rewrite function were called directly by the server, the server would not be able to handle command requests from the client during the rewrite of the AOF file.

Obviously, as an auxiliary maintenance tool, Redis does not want AOF rewrites to prevent the server from processing requests, so Redis decided to put the AOF rewrites into a subprocess, which accomplishes two things at once.

The server process (the parent process) can continue to process the command request while the child process does the AOF rewrite.
The child process carries a copy of the server process’ teachings, and using a child process instead of a thread ensures data security while avoiding the use of locks

However, there is a problem that needs to be solved by using child processes, because the server process needs to continue processing command requests during the AOF rewrite, and the new commands may modify the existing database state, thus making the current database state of the server and the database state saved in the rewritten AOF file inconsistent.

For an example

time	server process	subprocess
t1	execute command SET k1 v1
t2	execute command SET k1 v2
t3	execute command SET k1 v3
t4	Create subprocess, execute AOF file rewrite	Start AOF file rewrite
t5	execute the command SET k2 100	execute the rewrite operation
t6	execute command SET k3 101	execute rewrite operation
t7	execute command SET k4 102	Complete AOF rewrite

The above shows an example of AOF file rewrite. When the child process starts the file rewrite, there is only one key k1 in the database, but when the child process finishes the AOF file rewrite, there are three new keys k2, k3, and k4 in the database of the server process, so the rewritten AOF file and the current database state of the server are not consistent. The new AOF file only holds data for one key, k1, while the server database now has four keys, k1, k2, k3, and k4.

To solve this data inconsistency problem, the Redis server sets up an AOF rewrite buffer that is used after the server creates a child process, and when the Redis server finishes executing a write command, it sends the write command to both the AOF buffer and the AOF rewrite buffer. This means that during the execution of the AOF rewrite by the child process, the server process needs to perform the following three tasks.

Execute the command from the client.
Append the executed write command to the AOF buffer
Append the executed write command to the AOF rewrite buffer

redis aof

This ensures that.

The contents of the AOF buffer are written and synchronized to the AOF file on a regular basis, and processing of existing AOF files is performed as usual.
All write commands executed by the server since the creation of the child process are recorded in the AOF rewrite buffer.

When the child process finishes the AOF rewrite, it sends a signal to the parent process, and after receiving the signal, the parent process will call a signal handler and do the following:

write all the contents of the AOF rewrite buffer to the new AOF file, at which time the database state saved in the new AOF file will be the same as the current database state of the server.
rename the new AOF file, atomically overwriting the existing AOF file, completing the replacement of the old and new AOF files.

After this signal handler function is executed, the parent process can continue to receive command requests as usual. During the entire AOF background rewrite process, only the signal handler function will block the server process (the parent process). At other times, AOF background rewrites do not block the parent process, which minimizes the impact of AOF rewrites on server performance.

The complete rewriting process is as follows.

time	server process	subprocess
t1	execute command SET k1 v1
t2	execute command SET k1 v2
t3	execute command SET k1 v3
t4	Create subprocess, execute AOF file rewrite	Start AOF file rewrite
t5	execute the command SET k2 100	execute the rewrite operation
t6	execute command SET k3 101	execute rewrite operation
t7	execute command SET k4 102	Complete AOF rewrite, send signal to parent process
t8	Receive the signal from the child process, and append the commands SET k2 100, SET k3 101, SET k4 102 to the end of the new AOF file
t9	Overwrite the old AOF file with the new AOF file

The above is the principle of the AOF background rewrite, that is, the implementation of the BGREWRITEAOF command.

Summary

The AOF file records the database state of the server by saving all write command requests that modify the database.
All commands in the AOF file are saved in the format of the Redis command request protocol.
The different values of the appendfsync option have a significant impact on the security of the AOF persistence feature and the Redis server
The server can restore the original state of the database by simply loading and re-executing the commands saved in the AOF file.
An AOF rewrite produces a new AOF file that holds the same database state as the original AOF file, but in a smaller size.
AOF rewrite is an ambiguous name for a function that is implemented by reading key-value pairs from the database without the program having to perform any read, parse, or write operations on the existing AOF file.
When the BGRERIRTEAOF command is executed, the Redis server maintains an AOF rewrite buffer that records all write commands executed by the server during the creation of a new AOF file by a child process. When the child process finishes creating the new AOF file, the server appends everything in the rewrite buffer to the end of the new AOF file, making the database state stored in the old and new AOF files identical. Finally, the server replaces the old AOF file with the new AOF file, thus completing the AOF file rewrite operation.

Table of Contents