Continuous Archival Backups

pg has three basic backup methods.

  • sql dump using pg_dump, which is a logical backup that cannot be restored to a specified state.
  • File system based backup, requires file system to provide snapshot function to ensure consistency, otherwise must be shut down first and then backed up.
  • Continuous archiving, the preferred high-reliability backup technology.

Continuous archiving of WAL logs is the key to implementing archived backups, combining a file system level backup with an archived WAL file, so that when recovery is required, the file system backup is restored first, and then the archived WAL file is replayed to restore the system to its current (or specified point in time) state.

Advantages of continuous archiving.

  • Rather than requiring a fully consistent file system backup, a consistent state is reached by replaying the WAL from a redo point, so a simple tool can be used to make a base backup without stopping the machine.
  • Simple continuous archiving of WAL files enables continuous backups in situations where full backups are not possible frequently.
  • Use base backups and WAL archive reversion to specified points in time to restore the database to any point in time after the base backup.
  • Continuous transfer of WAL archive files to another machine to which the same base backup has been applied enables a master-backup replication system.

Disadvantages.

  • Only the entire database set cluster can be restored by backup, no finer granularity is supported.
  • Base backups and continuous archive files can take up a lot of space.

Enabling WAL archiving

pg continuously generates 16M WAL segment files in the pg_wal directory by default. pg will clean up and recycle WAL segment files when archiving mode is not enabled.

The archive process transfers the WAL segment files before the checkpoint from the pg_wal directory to a specified location (e.g., copied to a user-defined directory), and the original WAL segment files are purged or recycled after a successful transfer.

The pg archiving process is highly scalable, as the transfer process executes a shell command provided by the user, and the exact method and target location of the transfer is completely user-defined, with only the return code of the command determining whether the archiving was successful.

Example configuration to enable WAL archiving.

1
2
3
wal_level = replica # or higher level
archive_mode = on
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

There are the following notes about the archive command.

  • The archive command should never directly overwrite an archive file with the same name, since different set clusters may produce WAL segment files with the same name, and the archive command should return with a non-zero status code when an archive file already exists at the archive location.
  • The archive command should preserve the original filename of the WAL segment file.
  • The archive command should be designed to resolve or handle potential exceptions that could cause the pg_wal directory to grow and cause pg to shut down due to lack of space.
  • It is necessary to enable the logging_collector parameter so that the standard error output of the archive command will be collected in the database logs for easy debugging and monitoring of the archiving process.
  • For databases with a small workload, a WAL segment file may not be switched for a long time. Setting a small archive_timeout can shorten the time interval between when a transaction is executed and when it is reliably archived; ` Manually switch the WAL segment file, which also triggers the archiving of the WAL segment file used before the switch, this command can only be executed on the master.
  • If the command is terminated by a signal other than SIGTERM or a shell error (e.g., command not found), archiving will be aborted and the server will shut down.

Basic backup

A basic backup is a staged operation consisting of several sequential commands.

  • Execute pg_start_backup to start the backup, the database performs a checkpoint, explicitly creates a re-point at the start moment of the backup, writes as many transactions as possible to disk before this point in time, and records the LSN location of this checkpoint. In order not to interfere with online operation, this command performs the checkpoint by default by spreading out the IO, so it may take a long time to return. pg_start_backup also forces full-page write mode to ensure that writes to the database are fully replayable during the base backup.

  • After pg_start_backup is successfully executed, the user needs to archive the data directory using the file system tool itself to obtain a file system backup, noting that different files may have been copied at different points in time and therefore the state of the database in the file system backup may not be consistent. pg expects this to happen and brings it to a consistent state by replaying the WAL segment files written throughout the base backup to bring them to a consistent state.

  • When pg_stop_backup is executed, the database forcibly switches the WAL segment files in use and records the last LSN before the switch, with the ending LSN being the consistent state actually backed up by this base backup. The WAL segment files written during the base backup are archived, and the end operation sends a backup history file with the names of these WAL segment files to the archive location, the file name of which records the first WAL segment file needed to restore using this base backup, and the contents of which record the start and end timestamps of this backup and the required starting and ending WAL segment files. Applying these archived WAL segment files on top of the file system backup will restore to the consistent state represented by the ending LSN.

  • After ending the backup you need to create manually a backup_label file in the root directory of the file system backup with the output returned by the pg_stop_backup operation and preferably save the WAL segment files recorded in the backup history file separately, the file system backup plus the WAL segment files written during the backup is a valid base The file system backup plus the WAL segment files written during the backup is a valid base backup that can successfully restore the database to the consistent state it was in when the backup ended.

The details of the pg_start_backup and pg_stop_backup commands can be found below.

After securely archiving file system backups and WAL segment files used during backups (specified in the backup history file), WAL segment files with a numeric sequence in the file name smaller than the backup history file name can be deleted.

Recovery requires a base backup plus ongoing WAL archives created after that base backup, and since replaying a large number of WAL archives is time-consuming, the recommended practice is to do a base backup periodically, while cleaning up older WAL archives before the base backup (after cleaning it is impossible to restore to that point in time).

Make a base backup using the underlying API

  1. Make sure the archiving function is enabled and working properly.

  2. Connect to the database using the superuser and execute the following command (this connection needs to be maintained during the backup).

    1
    
    SELECT pg_start_backup('label', false, false);
    
    • The first parameter is a custom descriptive label.
    • The second parameter represents whether to turn on fast checkpointing, which when turned on will execute fast checkpointing immediately initiating a large number of IOs that may affect database performance during backups. Turning it off will disperse the IOs to reduce the impact on the database, but will take longer to execute.
    • The third parameter represents whether to enable exclusive backups, which is no longer recommended in newer versions.
  3. Use any file system backup tool to archive the data directory. This error can be ignored if the backup tool returns with a non-zero status code because the files were changed during the copy. Partial temporary subdirectories, files or files in subdirectories under the data directory (e.g. pg_wal) can be ignored during archiving.

  4. Continue with the following command in the same connection.

    1
    
    SELECT * FROM pg_stop_backup(false, true);
    

    This command will terminate the backup mode and automatically switch the WAL segment files so that the WAL segment files written during the backup can be archived, by default these WAL end files are archived successfully before the command returns, the second field output returned needs to be written to the backup_label file in the root directory of the file system backup, this file needs to be created manually in non-exclusive backup mode.

  5. When the end command returns successfully, it means that the WAL segment files written during the backup have been archived and the backup is finished. You can save the file system backup together with these WAL segment files, which form a complete base backup.

Making basic backups with pg_basebackup

pg_basebackup encapsulates the underlying pg_start_backup and pg_stop_backup commands and provides some very convenient features.

  • Supports archiving and compressing file system backups of data directories, automatically ignoring unwanted files, and automatically writing to backup_label files.
  • Automatically record checksums of backup files to prevent backups from being changed.
  • Support to automatically fetch all WAL segment archive files generated during backup and save them to pg_wal or other directories.

Usage examples.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ pg_basebackup -D /backup/demo -Ft -z -Xs -c fast -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/75ED8820 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_15233"
244438/244438 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/859200C8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

$ ls /backup/demo
backup_manifest  base.tar.gz  pg_wal.tar.gz

Extract base.tar.gz and pg_wal.tar.gz respectively to get the file system backup and the WAL segment archives written during the backup.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ cat /backup/demo/backup_label
START WAL LOCATION: 0/75ED8820 (file 000000010000000000000075)
CHECKPOINT LOCATION: 0/763C57B0
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2022-07-09 15:13:35 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

$ ls -l /backup/demo/pg_wal
total 278532
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000075
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000076
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000077
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000078
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000079
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 00000001000000000000007A
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 00000001000000000000007B
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 00000001000000000000007C
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 00000001000000000000007D
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 00000001000000000000007E
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 00000001000000000000007F
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000080
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000081
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000082
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000083
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000084
-rw------- 1 postgres postgres 16777216 Jul  9 15:13 000000010000000000000085
drwx------ 2 postgres postgres     4096 Jul  9 15:20 archive_status

The backup_label file in the data directory does not record the LSN location of the end of the base backup, but it can be viewed in the corresponding backup history file in the WAL archive directory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ cat 000000010000000000000075.00ED8820.backup
START WAL LOCATION: 0/75ED8820 (file 000000010000000000000075)
STOP WAL LOCATION: 0/859200C8 (file 000000010000000000000085)
CHECKPOINT LOCATION: 0/763C57B0
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2022-07-09 15:13:35 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1
STOP TIME: 2022-07-09 15:13:42 UTC
STOP TIMELINE: 1

When recovering with this base backup, the recovery process replayed the 0000000100000000000000000085 WAL segment file before the database entered a consistent state, after which the database was able to accept connections to perform query operations, as shown in the recovery log below.

1
2
3
4
5
6
2022-07-10 08:05:28.780 UTC [21181] LOG:  restored log file "000000010000000000000084" from archive
2022-07-10 08:05:29.024 UTC [21181] LOG:  restored log file "000000010000000000000085" from archive
2022-07-10 08:05:29.147 UTC [21181] LOG:  consistent recovery state reached at 0/859200C8
2022-07-10 08:05:29.148 UTC [21179] LOG:  database system is ready to accept read-only connections
2022-07-10 08:05:29.232 UTC [21181] LOG:  restored log file "000000040000000000000086" from archive
2022-07-10 08:05:29.480 UTC [21181] LOG:  restored log file "000000040000000000000087" from archive

Recovering from an archived backup

To restore, proceed as follows.

  • Stop the pg server process.
  • Back up the current data directory, or at least the WAL segment files in the pg_wal directory that have not been archived, if there is not enough space.
  • Remove all files and subdirectories from the data directory.
  • Restore file system backup to the data directory, making sure that the restored files and directories have the correct permissions.
  • If the pg_wal directory was not ignored when doing the file system backup, you now need to empty the restored pg_wal directory and copy the unarchived WAL segment files to it.
  • Create a recovery.signal file in the data directory that instructs pg to enter recovery mode at boot and will be automatically deleted after a successful recovery.
  • Set recovery-related parameters, such as restore_command, in the configuration file postgresql.conf, which is described in detail below.
  • Start the pg server. The server will enter recovery mode and start fetching and processing the WAL segment files needed for recovery. If the recovery process is terminated due to an external error (e.g. host power failure), you can simply restart the server to let it continue the recovery. When recovery is complete, the server deletes the recovery.signal file to prevent re-entry into recovery mode, and then begins normal operation.

The recovery process is not complicated, and the key points are set via the configuration file to.

  • The way to get the archived WAL segment file when recovering.
  • The target state to be reached for recovery.

Restore command

The restore_command (henceforth called the restore command) must be set to tell pg how to get the archived WAL segment file. Similar to the archive command, this command defines how to transfer the specific WAL segment file (which also contains other types of files) that pg needs from the archive location to a temporary location in the data directory in a user-defined way, and then pg will then reads the temporary file and performs a redo. More complex behavior can be customized in this command by executing a script.

If the archived WAL segment files are stored in the /backup/demo/pg_wal directory, the sample recovery command is as follows.

1
restore_command = 'cp /backup/demo/pg_wal/%f %p'

The recovery command is designed with the following caveats.

  • The recovery command must return with a non-zero status code if the transfer fails or the transferred file does not exist. pg will try to fetch some files that are not in the archive location with the recovery command, so the recovery will not be aborted directly because the recovery command failed.

    1
    2
    3
    4
    5
    6
    
    2022-07-10 08:05:24.762 UTC [21181] LOG:  restored log file "00000003.history" from archive
    2022-07-10 08:05:24.766 UTC [21181] LOG:  restored log file "00000004.history" from archive
    cp: cannot stat '/var/lib/postgresql/archive/14/demo/00000005.history': No such file or directory
    2022-07-10 08:05:24.771 UTC [21181] LOG:  starting point-in-time recovery to 2022-07-10 08:03:23.157122+00
    2022-07-10 08:05:24.776 UTC [21181] LOG:  restored log file "00000004.history" from archive
    2022-07-10 08:05:24.862 UTC [21181] LOG:  restored log file "000000010000000000000076" from archive
    
  • If pg cannot fetch a file with the recovery command, it will later try to find it from the pg_wal directory in the data directory.

  • If the command is terminated by a signal other than SIGTERM or a shell error (e.g., command not found), recovery will be aborted and the server will shut down.

Recovery modes

Recovery actually works in two modes.

  • Create a file named standby.signal in the data directory, which will enter standby mode when the server starts. The server enters the recovery state and does not stop recovering when it reaches the end of the archived WAL, but continues to try to recover by connecting to the primary server specified by the primary_conninfo setting or by using the restore_command to get a new WAL segment.
  • Create a file called recovery.signal in the data directory that will enter the target recovery mode when the server starts. Target recovery mode ends when the archived WAL segment file is fully replayed or when recovery_target is reached.

Typically, standby mode is used to provide high availability and extend read-only libraries, while target recovery is used to recover lost data or clone a new server. If both standby.signal and recovery.signal files are created, the standby mode takes precedence.

Recovery Target

By default, recovery mode processes all available WAL segment files, i.e., it increments the sequence number in the WAL segment naming to get the next WAL segment file until it fails, and finally restores the database to a “relatively up-to-date” state. So it is usually normal to see a “file not found” error during the recovery process, especially at the end of the recovery.

If you only need to restore to a point in time between the end of the base backup and the current moment, you need to specify the stopping point, the recovery target. The recovery target can be a timestamp, a named recovery point and a transaction ID, in practice point-in-time based recovery is used mostly.

We can specify a recovery target by choosing one of the following parameters and only one of the following.

  • recovery_target: the only available value is immediate, which stops recovery when it reaches a consistent state at the end of the base backup.
  • recovery_target_lsn: Set the recovery target to the specified LSN.
  • recovery_target_name: set the recovery target to the specified named recovery point (created by pg_create_restore_point()).
  • recovery_target_time: set the recovery target to the specified timestamp.
  • recovery_target_xid: sets the recovery target to the specified transaction ID.

Further set whether to include the recovery target and the behavior upon arrival with the following options.

  • recovery_target_inclusive: Specifies whether the recovery target is included on recovery, i.e. whether to replay WAL log entries that contain a target timestamp, LSN, or transaction ID, otherwise it will stop before that recovery target. The default value is true.
  • recovery_target_timeline: Specifies the timeline reached during recovery, usually using the default value latest, which is the last generated timeline in the archive log.
  • recovery_target_action: specifies the action that the server will perform when the recovery target is reached, which only takes effect when the recovery target is set. The following three actions can be executed.
    • pause: Suspends the recovery process. When paused, the user can perform the next action based on the current status queried from the database. You can choose to resume the paused recovery process, which will cause the recovery mode to end, or you can choose to restart the database server after changing the recovery target configuration and revert to another target. This option is equivalent to shutdown if the server is unable to accept queries due to the hot_standby=off configuration.
    • promot: Ends the recovery process and creates a new timeline to start accepting database connections.
    • shutdown: Shut down the server directly. This does not delete the recovery.signal file and will immediately shut down again if the server is restarted; if it is restarted after changing the recovery target configuration, it will enter recovery mode and continue redirecting to a new recovery target from the previous recovery target.

Recovery termination

In any case, if the recovery process ends due to running out of available WAL segment files before reaching the configured recovery target, the recovery process will exit with a FATAL error causing the server to shut down.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
2022-07-10 08:05:50.673 UTC [21181] LOG:  restored log file "0000000400000000000000D1" from archive
2022-07-10 08:05:50.865 UTC [21181] LOG:  restored log file "0000000400000000000000D2" from archive
cp: cannot stat '/var/lib/postgresql/archive/14/demo/0000000400000000000000D3': No such file or directory
2022-07-10 08:05:51.003 UTC [21181] LOG:  redo done at 0/D2CA8530 system usage: CPU: user: 5.91 s, system: 9.59 s, elapsed: 25.94 s
2022-07-10 08:05:51.003 UTC [21181] LOG:  last completed transaction was at log time 2022-07-10 08:02:30.614229+00
2022-07-10 08:05:51.003 UTC [21181] FATAL:  recovery ended before configured recovery target was reached
2022-07-10 08:05:51.015 UTC [21179] LOG:  startup process (PID 21181) exited with exit code 1
2022-07-10 08:05:51.015 UTC [21179] LOG:  terminating any other active server processes
2022-07-10 08:05:51.026 UTC [21179] LOG:  shutting down due to startup process failure
2022-07-10 08:05:51.047 UTC [21179] LOG:  database system is shut down

If the recovery process finds corrupt WAL data, recovery will stop immediately and the server will not start normally. In this case a new recovery target can be specified before the point of corruption and then the recovery process can be rerun.

If the recovery fails for external reasons, such as a system crash or inaccessible WAL archives, the recovery can be continued by simply restarting the server and the recovery process will start over from where it failed. Recovery restarts work much like normal checkpoints: the server periodically flushes all its state to disk, then updates the pg_control file with the completed WAL data at the record, which no longer needs to be reprocessed at restart.

Timeline

The concept of timeline is introduced in pg. A new timeline is created after each successful recovery based on the recovered timeline, which is mainly reflected in the naming of WAL segment files.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
postgres=# SELECT timeline_id FROM pg_control_checkpoint();
 timeline_id
-------------
           2
(1 row)
postgres=# SELECT pg_walfile_name(pg_current_wal_lsn());
     pg_walfile_name
--------------------------
 000000020000000000000086
(1 row)

Each time a new timeline is created pg also creates a timeline history file <timeline-ID>.history, which records the parent timeline of that timeline. When restoring from an archive containing multiple timelines, pg needs the timeline history file to keep going back and selecting the correct sequence of WAL segment files. Therefore the timeline history file is also archived to the WAL archive location.

The advantage of introducing a timeline is that users can restore based on the same backup and then back up again, simulating a tree backup history through the timeline, and then easily locate and switch between forks without worrying about confusion between them due to naming conflicts, etc. The timeline is very useful in some scenarios, and usually we use the most recent timeline in the WAL archive directly.