AOF and RDB persistence process review

From the previous two articles we know some details of AOF and RDB persistence. In this article, let’s compare and contrast what persistence method is better for what environment? First we review the process of the two persistence methods.

AOF Persistence Process

AOF persistence is similar to Mysql’s binlog log, which records all modified operations, and all commands sent by the client are appended and saved in Redis command protocol format, to ensure proper file size, Redis also rewrites the AOF file in the background by sub-process creation, so that the AOF file size does not exceed the actual capitalization required to save the dataset state, and Note that Redis gives preference to AOF files to restore data when the server is started by executing these commands, because AOF text saves a more complete dataset than RDB does, and the stored files are generally larger than RDB files.

RDB persistence process

RDB persistence is to generate a point-in-time snapshot of the dataset within a specified time interval, and when the conditions in the configuration file are met, the only thing the parent process has to do when saving the RDB file is to fork out a child process, and then the child process will handle all the next work. RDB is faster than AOF in recovering large data sets, and it does not need to execute instructions one by one.

Analysis

There are different characteristics in writing RDB and AOF, one is to append write, the other is to save the whole data set, from the data volume of these two operations, we can see that RDB can not be too frequent when writing, we need to control the frequency, another is to fork a child process each time, and also block the execution of the command, although the child process created by fork does not need to copy the physical memory space of the parent process, but will copy the spatial memory page table of the parent process. Although the child process created by fork does not need to copy the physical memory space of the parent process, it will copy the parent process’s spatial memory page table. For example, for a 10GB Redis process, about 20MB of memory page tables need to be copied, so the fork operation time is closely related to the total amount of process memory. For a high-traffic Redis instance with an OPS of 50,000 or more, if the fork operation takes seconds, it will slow down the execution of tens of thousands of Redis commands, which will have a significant impact on the latency of the online application. Under normal circumstances fork time consumption should be about 20 milliseconds per GB. You can check the latest_fork_usec metric in the info stats to get the latest fork operation time in microseconds.

Summary and analysis of advantages and disadvantages

Advantages and disadvantages of RDB

Pros

  • RDB is a compact compressed binary file representing a snapshot of Redis data at a point in time. It is ideal for backups, full replication, and other scenarios. For example, perform a bgsave backup every 6 hours, and copy the RDB file to a remote machine or file system (such as hdfs) for disaster recovery.
  • Redis load RDB recovery data is much faster than the AOF way.

Disadvantages

  • There is no way to persist data in real time/second in the RDB way. Because bgsave has to execute fork operation to create sub-processes every time it runs, it is a heavy operation, and it is too expensive to execute frequently.
  • RDB files are saved in a specific binary format, and there are multiple versions of RDB in Redis version evolution, so there is a problem that old versions of Redis services are not compatible with new versions of RDB format.
  • For the problem that RDB is not suitable for real-time persistence, Redis provides AOF persistence method to solve it.

Pros and Cons of AOF

Pros

  • The use of fsync policy can better ensure the integrity of the data
  • Having rewrites can have the effect of compressing the file size
  • After executing the flushall command, as long as the AOF file is not rewritten, remove the trailing flushall command and reboot to restore the previous state

Disadvantages

  • Using the fsync policy will reduce performance (this is a trade-off)
  • The file will also be slightly larger than RBD because the data is more complete

How do we choose?

If the data stored in redis is important, you should use both persistence features. If you care a lot about your data, but can still afford to lose data in a few minutes or less, then you can use only RDB persistence.

There are many users who only use AOF persistence, but we don’t recommend it: RDB snapshot generation is very convenient for database backups, and RDB restores datasets faster than AOF restores, in addition to avoiding some of the AOF bugs.