Writing shell scripts should be a must-have skill for programmers. Because of its simplicity and ease of use, we often use it in our daily work to automate application testing and deployment, environment cleanup, and so on. In fact, when writing and running shell scripts, there are various pitfalls that can cause shell scripts to not execute properly for various reasons if you are not careful. In fact, there are many tricks to writing robust and reliable shell scripts, so let’s explore them today.
Set the default execution environment parameters of the Shell
When executing a shell script, a new shell is usually created, for example, when we execute.
We specify that using
bash creates a new shell to execute
script.sh, given the default parameters of this execution environment. The
set command can be used to modify the runtime parameters of the shell environment. The
set command, without any parameters, will display all environment variables and shell functions. For all the customizable runtime parameters, see the official manual, and we will focus on the four most commonly used ones.
Tracking command execution
By default, shell scripts only display the run result after execution, and do not show which line of code the result is output from. If multiple commands are executed consecutively, their run results are output consecutively, making it difficult to tell what command produced a string of results.
set -x is used to output the line of command executed before the result, with
+ at the beginning of the line to indicate that it is a command instead of a command output, and the arguments of each command will be expanded so that we can clearly see the running parameters of each command, which is very friendly for shell script debugging.
There is another way to write
set -o xtrace.
Command execution failure should report an error
In fact, unlike other high-level languages such as Python, Ruby, etc., shell scripts do not provide security mechanisms by default. For example, a Ruby script will report an error when it tries to read the contents of an uninitialized variable, while a shell script will not be prompted by default and will simply ignore it.
As you can see,
echo $v outputs a blank line, and bash completely ignores the non-existent
$v and continues with the subsequent command
echo "hello". This is not really the behavior the developer wants, and the script should report errors and stop execution for non-existent variables to prevent errors from piling up. The good thing is that we can change this default behavior of ignoring undefined variables by using
set -u. The script adds it to the header, and when it encounters a non-existent variable it will report an error and stop execution.
Another way to write
set -u is
set -o nounset.
Command execution should stop when it fails
For the default shell script runtime environment, with a failed command (with a non-zero return value), bash will continue executing the subsequent commands.
As you can see, bash just shows an error and continues executing the shell script, which is not good for script safety and troubleshooting. In practice, if a command fails, it is often necessary to stop the script to prevent errors from accumulating. In this case, the following is generally used.
This means that the shell script will stop whenever
command has a non-zero return value. If more than one operation needs to be completed before execution can be stopped, the following three more advanced ways of writing are used.
In addition, it is easy to associate another very similar use, if two commands have an inheritance relationship, and only if the first command succeeds can the second command continue to be executed, then it can be written in the following way.
But these techniques are somewhat cumbersome and easy to overlook.
set -e solves this problem at the root, by making the script terminate whenever an error occurs.
As you can see, the script terminates after line 4 fails to execute.
set -e determines whether a command has failed to run based on the return value of the command. However, the non-zero return value of some commands may not indicate failure, or the developer may want the script to continue execution if the command fails.
As you can see, after turning on
set -e, even though
ls is an existing command, it returns a non-zero value because
foobar, the runtime parameter of the
ls command, does not actually exist, which is sometimes not what we want to see.
You can temporarily close
set -e and reopen
set -e after the command has finished executing.
In the above code,
set +e means turn off the
-e option and
set -e means turn back on the
There is another way to write it that serves a similar purpose.
The above command
command will not terminate the script even if it fails to execute.
There is another way to write
set -o errexit.
Controlling the execution of pipeline commands
One exception to the
set -e described in the previous section is that it does not apply to pipe commands. For pipeline commands, bash takes the return value of the last subcommand as the return value of the entire command. That is, as long as the last subcommand does not fail, the pipe command will always execute successfully, so its subsequent commands will still be executed, so
set -e will not work. As an example.
As you can see, even though
foo is a non-existent command, the pipefail command
foo | echo bar will still execute successfully, causing the subsequent
echo hello to continue to execute.
set -o pipefail is used to resolve this situation, as long as a subcommand fails, the entire pipeline command will fail and the script will terminate.
As you can see, the
foo | echo bar pipeline command fails the entire shell script exit, and the subsequent
echo hello command is not executed.
Merging Shell Default Execution Environment Parameters
For the four
set command parameters mentioned above, they are generally used together.
Either of these two writings is placed in the head of all shell scripts.
Of course, these parameters can also be passed from the bash command line during the execution of the shell script.
Shell scripting defensive programming
Shell scripts should be written with unanticipated program input in mind, such as files that do not exist or directories that are not created successfully. Shell commands actually have many options to solve such problems. For example, when creating a directory with
mkdir returns an error by default if the parent directory does not exist, but with the
mkdir creates the parent directory first if it does not exist;
rm fails to delete a non-existent file, but with the
-f option, it executes successfully even if the file cannot exist.
Beware of spaces in strings
We must always be aware of spaces in strings, such as spaces in filenames, spaces in command parameters, etc. The best practice for the safety of these spaces is to use quotes to enclose the corresponding string.
Similarly, when using
$@ or other strings containing multiple strings separated by spaces, we should also be careful to enclose the corresponding variables in quotes; in fact, enclosing the corresponding variables in quotes has no side effects and only makes our shell scripts more robust.
Use trap command more often to catch signals
Another common scenario regarding shell scripts is when a script execution fails and leaves the filesystem in an inconsistent state, such as a file lock, a temporary file, or a shell script error that updates only part of the file. In order to achieve “transaction integrity” we need to resolve these inconsistencies by either removing the file locks, temporary files, or restoring the state to what it was before the update. In fact, shell scripts do provide the ability to execute a command or function if a specific unix signal is caught.
Shell scripts can actually catch many types of signals (the full list of signals can be obtained with the
kill -l command), but we usually only care about the three signals used to recover the scene after a problem has occurred:
|INT||Interrupt - this signal is sent when someone kills the script by pressing ctrl-c|
|TERM||Terminate - this signal is sent when someone sends the TERM signal using the kill command|
|EXIT||Exit - this is a pseudo-signal and is triggered when your script exits, either through reaching the end of the script, an exit command or by a command failing when using set -e|
In general, we create a file lock before manipulating the corresponding share.
However, if someone manually kills the shell process while the shell script is operating on the corresponding share, the presence of the file lock will prevent the shell script from operating on the corresponding share again. Using the
trap command, we can catch the corresponding signal and restore it accordingly.
trap command above, the file lock will be cleaned up even if someone manually kills the corresponding shell process while the shell script is operating on the corresponding share. Note that we exit directly after deleting the file lock after catching the signal instead of continuing execution.
The correct approach is to make update operations as atomic as possible to achieve “transaction consistency”:
- copy the old directory.
- perform the update operation in the copied directory.
- replace the original directory.
It is very fast to perform the last two
mv operations on a Unix-like file system (because only the
inode of two directories need to be replaced without performing the actual copy operation), in other words, the error-prone area is the bulk update operation, and we perform all the update operations in the copied directory so that the update operation, even if it goes wrong, does not affect the original directory. The trick here is to use double the hard disk space for the operation, and any operation that is required to open the file for a long time is performed in the backup directory. In fact, keeping a series of operations atomic is very important for some error-prone shell scripts, and it is also a good programming habit to back up files before operating.