1. Introduction

rsync is a commonly used Linux application for file synchronization.

It can synchronize files between the local computer and a remote computer, or between two local directories (but does not support synchronization between two remote computers). It can also be used as a file copying tool, replacing the cp and mv commands.

rsync

The r in its name refers to remote, rsync actually means “remote synchronization”. Unlike other file transfer tools (such as FTP or scp), rsync’s main feature is that it checks the existing files of both the sender and the receiver and only transfers the parts that have changed (the default rule is that the file size or modification time has changed).

2. Installation

If rsync is not installed on the local or remote computer, you can use the following command to install it.

1
2
3
4
5
6
7
8
# Debian
$ sudo apt-get install rsync

# Red Hat
$ sudo yum install rsync

# Arch Linux
$ sudo pacman -S rsync

Note that rsync must be installed on both sides of the transfer.

3. Basic usage

3.1 -r parameter

When using the rsync command locally, it can be used as an alternative to the cp and mv commands to synchronize the source directory to the target directory.

1
$ rsync -r source destination

In the above command, -r means recursive, i.e. contains subdirectories. Note that -r is required, otherwise rsync will not run successfully. The source directory indicates the source directory, and destination indicates the destination directory.

If there are multiple files or directories to be synchronized, you can write it like the following.

1
$ rsync -r source1 source2 destination

In the above command, source1 and source2 will be synchronized to the destination directory.

3.2 -a parameters

The -a parameter can be used instead of -r to synchronize meta-information (such as modification time, permissions, etc.) in addition to recursive synchronization. Since rsync uses file size and modification time by default to determine if a file needs to be updated, -a is more useful than -r. The following usage is the common way to write it.

1
$ rsync -a source destination

If the destination directory destination does not exist, rsync will create it automatically. After executing the above command, the source directory source is copied completely to the destination directory destination, i.e. the directory structure destination/source is formed.

If you want to synchronize only the contents of the source directory source to the destination directory destination, you need to add a slash after the source directory.

1
$ rsync -a source/ destination

After the above command is executed, the contents of the source directory will be copied to the destination directory, and no source subdirectory will be created under destination.

3.3 -n parameter

If you are not sure what the result of rsync execution will be, you can first simulate the result with -n or -dry-run arguments.

1
$ rsync -anv source/ destination

In the above command, the -n parameter simulates the result of the command execution and does not actually execute the command. The -v parameter, on the other hand, outputs the results to the terminal so that you can see what will be synchronized.

3.4 –delete parameter

By default, rsync only ensures that all the contents of the source directory (except for explicitly excluded files) are copied to the target directory. It does not make the two directories identical and does not delete files. If you want to make the target directory a mirror copy of the source directory, you must use the --delete parameter, which will delete files that exist only in the target directory and not in the source directory.

1
$ rsync -av --delete source/ destination

The -delete parameter in the above command will make destination a mirror of source.

4. Excluded documents

4.1 –exclude parameter

Sometimes we want to exclude certain files or directories from synchronization, so we can specify the exclusion mode with the --exclude parameter.

1
2
3
$ rsync -av --exclude='*.txt' source/ destination
# or
$ rsync -av --exclude '*.txt' source/ destination

The above command excludes all TXT files.

Note that rsync will synchronize hidden files starting with “dot”, to exclude hidden files, you can write --exclude=". *".

If you want to exclude all files inside a directory, but do not want to exclude the directory itself, you can write it like this.

1
$ rsync -av --exclude 'dir1/*' source/ destination

Multiple exclude modes with multiple --exclude parameters.

1
$ rsync -av --exclude 'file1.txt' --exclude 'dir1/*' source/ destination

Multiple exclusion patterns can also take advantage of Bash’s large expansion number extension with just one -exclude parameter.

1
$ rsync -av --exclude={'file1.txt','dir1/*'} source/ destination

If there are many exclusion patterns, you can write them to a file, one line per pattern, and then specify this file with the --exclude-from parameter.

1
$ rsync -av --exclude-from='exclude-file.txt' source/ destination

4.2 The –include parameter

The --include parameter is used to specify the file pattern that must be synchronized, often in combination with --exclude.

1
$ rsync -av --include="*.txt" --exclude='*' source/ destination

The above command specifies that when synchronizing, all files are excluded, but TXT files will be included.

5. Remote synchronization

5.1 SSH Protocol

rsync supports synchronization between two local directories as well as remote synchronization. It can synchronize local content, to a remote server.

1
$ rsync -av source/ username@remote_host:destination

It is also possible to synchronize remote content to local.

1
$ rsync -av username@remote_host:source/ destination

rsync uses SSH for remote login and data transfer by default.

Since rsync does not use SSH protocol in the early days, you need to specify the protocol with -e parameter, but it was changed later. So, the following -e ssh can be omitted.

1
$ rsync -av -e ssh source/ user@remote_host:/destination

However, if the ssh command has additional parameters, you must specify the SSH command to be executed with the -e parameter.

1
$ rsync -av -e 'ssh -p 2234' source/ user@remote_host:/destination

In the above command, the -e parameter specifies that SSH uses port 2234.

5.2 rsync protocol

In addition to using SSH, if another server has the rsync daemon installed and running, you can also transfer using the rsync://protocol (default port 873). This is written with a double colon separating the server from the target directory::.

1
$ rsync -av source/ 192.168.122.32::module/destination

Note that the module in the above address is not an actual pathname, but a resource name assigned by the rsync daemon and assigned by the administrator.

If you want to know the list of all modules assigned by the rsync daemon, you can execute the following command.

1
$ rsync rsync://192.168.122.32

In addition to using a double colon, the rsync protocol also allows you to specify the address directly with rsync:// protocol.

1
$ rsync -av source/ rsync://192.168.122.32/module/destination

6. Incremental backup

The most important feature of sync is that it can perform incremental backups, i.e. only files that have changed are copied by default.

In addition to direct comparison between the source and target directories, rsync also supports the use of a base directory, i.e. syncing the changes between the source and base directories to the target directory.

The first sync is a full backup, and all files are synchronized in the base directory. Each subsequent sync is an incremental backup, where only the part of the source directory that has changed from the base directory is synced to a new target directory. This new target directory also contains all files, but in fact, only the files that have changed exist in this directory, the other files that have not changed are hard links to the base directory files.

The -link-dest parameter is used to specify the base directory for synchronization.

1
$ rsync -a --delete --link-dest /compare/path /source/path /target/path

In the above command, the --link-dest parameter specifies the base directory /compare/path, then the source directory /source/path is compared with the base directory, the changed files are found and copied to the target directory /target/path, and the unchanged files are generated as hard links. The first backup of this command is a full backup, and all subsequent backups are incremental.

Here is an example script that backs up the user’s home directory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash

# A script to perform incremental backups using rsync

set -o errexit
set -o nounset
set -o pipefail

readonly SOURCE_DIR="${HOME}"
readonly BACKUP_DIR="/mnt/data/backups"
readonly DATETIME="$(date '+%Y-%m-%d_%H:%M:%S')"
readonly BACKUP_PATH="${BACKUP_DIR}/${DATETIME}"
readonly LATEST_LINK="${BACKUP_DIR}/latest"

mkdir -p "${BACKUP_DIR}"

rsync -av --delete \
  "${SOURCE_DIR}/" \
  --link-dest "${LATEST_LINK}" \
  --exclude=".cache" \
  "${BACKUP_PATH}"

rm -rf "${LATEST_LINK}"
ln -s "${BACKUP_PATH}" "${LATEST_LINK}"

In the above script, each sync generates a new directory ${BACKUP_DIR}/${DATETIME} and points the soft link ${BACKUP_DIR}/latest to this directory. The next time you backup, use ${BACKUP_DIR}/latest as the base directory to generate a new backup directory. Finally, point the soft link ${BACKUP_DIR}/latest to the new backup directory again.

7. Configuration items

The -a, --archive parameters indicate archive mode, which saves all metadata such as modification time, permissions, owner, etc., and the soft links are synchronized over.

The --append parameter specifies that the file continues the transfer where it was last interrupted.

The --append-verify parameter is similar to the --append parameter, but it performs a checksum on the completed file after transfer. If the checksum fails, the entire file will be resent.

The -b, -backup parameters specify that when deleting or updating a file that already exists in the target directory, the file is renamed and then backed up, and the default behavior is to delete. The renaming rule adds the file extension specified by the -suffix parameter, the default is ~.

The -backup-dir parameter specifies the directory where the files are stored when backing up, e.g. -backup-dir=/path/to/backups.

The -bwlimit parameter specifies the bandwidth limit, the default unit is KB/s, for example --bwlimit=100.

The -c, --checksum parameter changes the way rsync checksums. By default, rsync only checks if the file size and last modified date have changed, and retransmits if they have; after using this parameter, it decides whether to retransmit by determining the checksum of the file content.

The -delete parameter deletes files that exist only in the target directory and not in the source target, i.e. it ensures that the target directory is a mirror of the source target.

The -e parameter specifies that the SSH protocol is used to transfer data.

The --exclude parameter specifies the exclusion of files that are not synchronized, e.g. --exclude="*.iso".

The --exclude-from parameter specifies a local file containing the file patterns to be excluded, one line per pattern.

The -existing, -ignore-non-existing parameters indicate that files and directories that do not exist in the target directory are not synchronized.

The -h parameter indicates output in a human-readable format.

The -h, --help arguments return help information.

The -i parameter indicates the output of details of file differences between the source and target directories.

The --ignore-existing parameter indicates that as long as the file already exists in the target directory, skip it and do not synchronize these files again.

The --include parameter specifies the files to be included when synchronizing, and is usually used in combination with --exclude.

The -link-dest parameter specifies the base directory for incremental backups.

The -m parameter specifies that empty directories are not synchronized.

The -max-size parameter sets the size limit of the maximum file to be transferred, e.g. no more than 200KB (-max-size='200k').

The -min-size parameter sets the size limit of the smallest file to be transferred, e.g. not less than 10KB (-min-size=10k).

The -n parameter or the -dry-run parameter simulates the operation that will be performed without actually performing it. Used with the -v parameter, you can see what will be synchronized over.

The -P parameter is a combination of the -progress and -partial parameters.

The --partial parameter allows resuming an interrupted transfer. When this parameter is not used, rsync will delete the files interrupted halfway through the transfer; when this parameter is used, the files halfway through the transfer will also be synchronized to the target directory, and the interrupted transfer will be resumed the next time the transfer is synchronized. Usually needs to be used with -append or -append-verify.

The --partial-dir parameter specifies that the files transferred to half are saved to a temporary directory, e.g. --partial-dir=.rsync-partial. Usually needs to be used with --append or --append-verify.

The --progress parameter indicates that progress is displayed.

The -r argument indicates recursion, i.e., the inclusion of subdirectories.

The --remove-source-files parameter indicates that the sender’s files are removed after a successful transfer.

The --size-only parameter indicates that only files with changes in size are synchronized, regardless of the difference in file modification time.

The -suffix parameter specifies the suffix to be added to the filename when it is backed up, the default is ~ .

The -u, -update arguments indicate that files with updated modification times in the target directory are skipped when synchronizing, i.e., those files with updated timestamps are not synchronized.

The -v parameter indicates output details. -vv indicates output of more detailed information, and -vvv indicates output of the most detailed information.

The --version argument returns the version of rsync.

The -z parameter specifies to compress the data when synchronizing.


Reference http://www.ruanyifeng.com/blog/2020/08/rsync.html