This article is a detailed explanation of Docker custom images, how to build your own Docker images, and the Dockerfile instructions.
I. Using Dockerfile to customize images
1.1, Dockerfile customization image
Customization of images is actually customizing the configuration and files added to each layer. If we can write a script for each layer to modify, install, build, and operate the commands, and use this script to build and customize the image, the problem of not being able to repeat, the problem of transparency of image construction, and the problem of volume will all be solved. This script is
Dockerfile is a text file that contains a set of instructions, each of which builds a layer, so the content of each instruction describes how that layer should be built.
Let’s take the
nginx image as an example, this time we use
Dockerfile to customize it.
In a blank directory, create a text file and name it
The contents are as follows.
Dockerfile is very simple, just two lines in total. It involves two directives,
1.2、FROM Specify the base image
The so-called custom image, that must be based on an image, on which to customize. Just like we ran a
nginx image of the container before, and then modify it, the base image must be specified. And
FROM is to specify the base image, so
FROM is a required directive in a
Dockerfile, and must be the first directive.
There are many high-quality official images on the
Docker Store, including service images that can be used directly, such as
tomcat, etc. There are also images for developing, building, and running applications in various languages, such as
golang and so on. It is possible to find a image among them that best matches our ultimate goal as the base image for customization.
If you do not find a image that corresponds to the service, the official images also provide some more basic operating system images, such as
alpine, etc. The software libraries of these operating systems provide us with a broader scope for expansion.
In addition to choosing existing images as the base image,
Docker also has a special image called
scratch. This image is a virtual concept and does not actually exist; it represents a blank image.
If you use
scratchas the base image, that means you don’t base it on any image, and the next instructions written will exist as the first layer of the image to begin with.
It is not uncommon to copy executables directly into images without any system base, e.g.
coreos/etcd. For statically compiled programs on Linux, there is no need to have runtime support from the operating system, and all the libraries needed are already in the executable, so directly
FROM scratch makes the image much smaller. Many applications developed in Go use this way to create images, which is one of the reasons why some people consider
Go to be a particularly suitable language for container microservices architectures.
1.3、RUN Execute command
RUN command is used to execute command line commands. Due to the power of the command line, the
RUN command is one of the most common commands used when customizing images. It comes in two formats.
RUN <command>, which is like a command typed directly from the command line. The
Dockerfilethat I just wrote is in this format.
RUN echo '<h1>Hello, Docker!</h1>' > /usr/share/nginx/html/index.html
RUN ["executable", "argument1", "argument2"], which is more like the format used in function calls.
RUN can execute commands just like
Shell scripts, can we have a
RUN for each command just like
Shell scripts? For example, like this.
As I said before, every command in
Dockerfile creates a layer, and
RUN is no exception. The behavior of each
RUN is the same as the process we just used to create the image manually: create a new layer, execute the commands on it, and after that,
commit the changes on that layer to form a new image.
The way it’s written above, it creates 7 layers of images. This is completely pointless, and a lot of things that are not needed at runtime are loaded into the image, such as compiled environments, updated packages, and so on. The result is a very bloated, multi-layered image that not only increases the time to build and deploy, but is also error-prone. This is a common mistake that many people who are new to
There is a maximum number of layers in
Union FS, such as
AUFS, which used to have a maximum of 42 layers and now has a maximum of 127 layers.
The correct way to write the above
Dockerfile would be as follows:
First, all the previous commands have only one purpose, to compile and install the
Redis executable. So there is no need to create many layers, this is just one layer. So, instead of using many
RUN pairs corresponding to different commands, there is just one
RUN command, and
&& to concatenate all the required commands. This simplifies the previous 7 layers to 1 layer. When writing a
Dockerfile, always remind yourself that you are not writing a
Shell script, but rather defining how each layer should be built.
And, there are line breaks for formatting purposes.
Dockerfile supports a command line feed with
\ at the end of the line for Shell classes, and a comment format with
# at the beginning of the line. Good formatting, such as line breaks, indentation, comments, etc., will make maintenance and troubleshooting easier, which is a better habit.
Also, you can see the cleanup command added at the end of this set of commands, which removes the software needed in order to compile the build, cleans up all downloaded and expanded files, and also cleans up the
apt cache file. This is a very important step, as we said before, images are multi-layer storage, and things on each layer are not deleted at the next layer, they stay with the image. So when building the image, make sure that you only add what you really need to add at each layer, and that anything extraneous is cleaned up.
One of the reasons why many people who are new to
Docker create bloated images is that they forget to clean up extraneous files at the end of each build.
1.4. Building the image
Let’s go back to the
Dockerfile of the custom
Nginx image we made earlier. Now that we understand the contents of the
Dockerfile, let’s build the image.
Execute the following command in the directory where the
Dockerfile file is located.
From the output of the command, we can clearly see how the image was built. In
Step 2, as we said before, the
RUN command starts a container
9cdc27646c7b, executes the requested command, and finally commits the layer
44aa4490ce2c, and then deletes the used container
Here we used the
docker build command to build the image. The format is.
Here we specify the name of the final image
-t nginx:v3, and after a successful build, we can run this image as we did
nginx:v2 before, and the result will be the same as
1.5, image build context (Context)
If you pay attention, you will see that the
docker build command ends with a
. means the current directory, and
Dockerfile is in the current directory, so many beginners think that this path is specifying the path where
Dockerfile is located, which is actually inaccurate. If you look at the command format above, you will probably find that it is specifying context path. So what is context?
First we need to understand how
docker build works.
Docker is divided at runtime into the
Docker engine (also known as the server daemon) and the client tools. The
Docker engine provides a set of REST APIs, called the
Docker Remote API, and client tools like the
docker command interact with the
Docker engine through this set of
APIs to perform various functions. So, although it seems that we are executing various
docker functions locally, in reality, everything is done on the server side (the
Docker engine) using remote calls. This
C/S design also makes it easy to manipulate the
Docker engine on the remote server.
When we build an image, not all customizations are done with the
RUN command, but often some local files are copied into the image, for example, with the
COPY command, the
ADD command, and so on. The
docker build command builds the image, not locally, but on the server side, i.e. in the
Docker engine. So in this client/server architecture, how can the server get the local files?
This introduces the concept of context. When building, the user specifies the path to the build image context, and the
docker build command learns this path, packages everything under it, and uploads it to the
Docker engine. Once the
Docker engine receives the context package, it expands it and gets all the files it needs to build the image.
If you write this in the
This is not a copy of
package.json in the directory where the
docker build command was executed, nor is it a copy of
package.json in the directory where
Dockerfile is located, but a copy of
package.json in the context directory.
Therefore, the paths to the source files in commands like
COPY are *relative paths. This is why beginners often ask why
COPY ... /package.json /app or
COPY /opt/xxxx /app does not work, because those paths are out of context and the Docker engine cannot get the files in those locations. If you really need those files, you should copy them to the context directory.
Now you can understand the command
docker build -t nginx:v3 . in this
., you are actually specifying the context directory where the
docker build command will package the contents of that directory to the Docker engine to help build the image.
If we look at the
docker build output, we have actually seen this process of sending a context.
Understanding the build context is important for image building to avoid making mistakes you shouldn’t make. For example, some beginners find that
COPY /opt/xxxx /app doesn’t work, so they simply put
Dockerfile in the root of their hard drive to build it, only to find that
docker build executes and sends a few dozen
GB of stuff, which is extremely slow and prone to build failure. That’s because this approach is asking
docker build to pack the entire hard drive, which is clearly a misuse.
In general, you should put
Dockerfile in an empty directory, or in the root of the project. If there are no required files in that directory, then you should make a copy of the required files. If there are things in the directory that you really don’t want to pass to the Docker engine at build time, then you can write a
.dockerignore with the same syntax as
.gitignore, which is used to weed out files that don’t need to be passed to the Docker engine as context.
So why would anyone mistakenly think that
. is to specify the directory where the
Dockerfile is located? This is because by default, if you don’t specify
Dockerfile additionally, a file named
Dockerfile in the context directory will be used as the Dockerfile.
This is only the default behavior, in fact the filename of
Dockerfile is not required to be
Dockerfile, and it is not required to be located in the context directory, for example you can use
-f . /Dockerfile.php parameter to specify a file as a
Of course, it is customary to use the default filename
Dockerfile and to place it in the image build context directory.
1.6. Other uses of docker build
1.6.1. Building directly from the Git repo
docker build also supports building from a
URL, for example, you can build directly from the
This command specifies the
Git repo required for the build, and specifies the default
master branch and the build directory as
/8.14/, then Docker will go to the
git clone project itself, switch to the specified branch, and go to the specified directory and start the build.
1.6.2. Build with the given tarball
If the URL given is not a
Git repo but a
tar archive, then the
Docker engine will download the archive, unpack it automatically, and use it as a context to start the build.
1.6.3. Reading a Dockerfile from standard input for a build
If the standard input is passed in as a text file, it is treated as a
Dockerfile and the build begins. This form has no context since it reads the contents of the
Dockerfile directly from the standard input, so it is not possible to do things like
COPY the local file into the image like other methods can.
1.6.4, read the contextual zip package from the standard input for construction
If the standard input file format is
xz, it will be made a contextual archive, expand it directly, treat it as a context, and start building.
II. Dockerfile directives
We have already introduced
RUN, and also mentioned
ADD, in fact
Dockerfile is very powerful, it provides more than ten directives. Let’s continue to explain the other directives.
COPY <source path>... <target path>
COPY ["<source path1>",... "<target path>"]
RUN command, there are two formats, one similar to a command line and one similar to a function call.
COPY command copies files/directories from the
<source path> in the build context directory to the
<target path> location within the image of the new layer. For example.
<source path> can be multiple, or even wildcards, with wildcard rules that satisfy
filepath.Match rule, e.g.
<target path> can be either an absolute path within the container or a relative path to the working directory (the working directory can be specified with the
WORKDIR command). The target path does not need to be created beforehand, if the directory does not exist, the missing directory will be created before copying the file.
It is also important to note that with the
COPY command, all metadata of the source file is preserved. For example, read, write, execute permissions, file change time, etc. This feature is useful for image customization. Especially if the build-related files are being managed using
The format and nature of the
ADD command is basically the same as that of
COPY. But it adds some features to
<source path> can be a
URL, in which case the Docker engine will try to download the linked file to
<destination path>. The downloaded file permissions are automatically set to
600, and if this is not the desired permissions, then an additional layer of
RUN will be added to adjust the permissions. So it makes more sense to just use the
RUN command and then use the
curl tool to download, handle permissions, unzip, and then clean up the useless files. Therefore, this feature is not really practical and is not recommended.
<source> is a
tar zip file in
xz format, the
ADD command will automatically decompress the zip file to
This is useful in some cases, such as in the official image
However, in some cases, if we really want to copy a zip file without unzipping it, we can’t use the
Dockerfile best practices document asks to use
COPY whenever possible, because the semantics of
COPY are clear: it’s just copying a file, while
ADD contains more complex functionality and its behavior is not always clear. The most suitable situation for using
ADD is the one mentioned, where automatic decompression is required.
Also note that the
ADD command will invalidate the image build cache, which may make image builds slower.
Therefore, when choosing between the
ADD directives, you can follow the principle of using the
COPY directive for all file copying, and using
ADD only when automatic decompression is required.
The format of the
CMD command is similar to that of
RUN, which is also in two formats.
CMD ["executable", "parameter1", "parameter2"...]
- Parameter list format:
CMD ["parameter1", "parameter2"...]. After specifying the
ENTRYPOINTdirective, specify the specific parameters with
As we said before when introducing containers, Docker is not a virtual machine, containers are processes. Since it is a process, when you start the container, you need to specify the program and parameters to run. The
CMD command is used to specify the default container main process start command.
For example, the default
CMD for the
ubuntu image is
/bin/bash. If we run
docker run -it ubuntu, we will go directly to
bash. We can also specify another command to run at runtime, such as
docker run -it ubuntu cat /etc/os-release. This replaces the default
/bin/bash command with the
cat /etc/os-release command, which outputs the system version information.
In terms of command format, the
exec format is recommended. This format will be parsed as a
JSON array, so be sure to use double quotes
" instead of single quotes.
If you use the
shell format, the actual command will be wrapped as a
sh -c argument. For example.
In the actual implementation, this will be changed to
This is why we can use environment variables, because they are parsed by the shell.
CMD, we have to mention the issue of foreground and background execution of applications in containers. This is a common confusion for beginners.
Docker is not a virtual machine, applications in containers should be executed in the foreground, not like virtual machines or physical machines, using
upstart/systemd to start background services, there is no concept of background services in containers.
Some beginners write
Then I found that the container exited immediately after execution. Even inside the container to use the
systemctl command but found that it does not execute at all. This is because they do not understand the concept of foreground and background, do not distinguish the difference between containers and virtual machines, still in the traditional virtual machine perspective to understand the container.
For the container, its startup program is the container application process, the container is for the main process and exists, the main process exit, the container will lose the meaning of existence, and thus exit, other auxiliary processes are not something it needs to care about.
service nginx start command, you want upstart to start the
nginx service as a background daemon. And as I said earlier
CMD service nginx start will be interpreted as
CMD ["sh", "-c", "service nginx start"], so the primary process is actually
sh. Then when the
service nginx start command finishes,
sh also finishes, and
sh exits as the master process, which naturally causes the container to exit.
The correct way to do this is to execute the
nginx executable directly and require it to be run as a foreground. For example.
The format of
ENTRYPOINT is the same as that of the
RUN command, which is divided into
exec format and
The purpose of
ENTRYPOINT is the same as
CMD, it is to start the program and parameters in the specified container.
ENTRYPOINT can also be substituted at runtime, but is slightly more cumbersome than
CMD and needs to be specified via the
-entrypoint argument to
ENTRYPOINT is specified, the meaning of
CMD changes and instead of running its command directly, the contents of
CMD are passed as an argument to the
ENTRYPOINT command, in other words, when actually executed, it becomes.
So why do we need
ENTRYPOINT after we have
CMD? Is there any benefit to this
<ENTRYPOINT> "<CMD>"? Let’s look at a few scenarios.
2.4.1, Scenario 1: Make the image look like a command
Suppose we need a image that knows our current public
IP, then we can start with
CMD to achieve.
If we use
docker build -t myip . to build the image, if we need to query the current public IP, we just need to run
So it looks like we can use the image as a command, but there are always parameters to the command, what if we want to add parameters? For example, as you can see from the
CMD above, the actual command is
curl, so if we want to display HTTP headers, we need to add the
-i argument. Can we just add the
-i argument to
docker run myip?
We can see the error that the executable file is not found,
executable file not found. As we said before, the image name is followed by
command, which replaces the default value of
CMD when run. So here
-i replaces the original
CMD, instead of being added after the original
curl -s http://ip.cn. And
-i is not a command at all, so naturally it is not found.
So if we want to add the
-i parameter, we have to retype the command in its entirety.
This is obviously not a very good solution, and using
ENTRYPOINT solves the problem. Now let’s reuse
ENTRYPOINT to implement this image.
This time let’s try it again directly with
docker run myip -i.
As you can see, it worked this time. This is because when
ENTRYPOINT exists, the contents of the
CMD will be passed as an argument to
ENTRYPOINT, and here
-i is the new
CMD, so it will be passed as an argument to
curl, thus achieving the desired effect.
2.4.2, Scenario 2: Preparations before running the application
Starting the container is to start the main process, but there are times when some preparatory work is needed before starting the main process.
For example, a
mysql class database may require some database configuration, initialization work that has to be solved before the final mysql server can be run.
In addition, you may want to avoid using the
root user to start the service to improve security, and you may need to perform some necessary preparation work as
root before starting the service, and then switch to the service user to start the service. In addition to the service, other commands can still be executed as
root to facilitate debugging, etc.
These preparations are not related to the container
CMD, no matter what
CMD is, a pre-processing work is needed beforehand. In this case, you can write a script and put it in
ENTRYPOINT, which will take the received parameters (i.e.
<CMD>) as commands and execute them at the end of the script. This is how it is done in the official image
redis, for example.
You can see that the
Redis user is created for the
Redis service, and the
ENTRYPOINT is specified at the end for the
The script is based on the contents of the
CMD, if it is
redis-server then switch to the
redis user identity to start the server, otherwise it will continue to use the
root identity. For example.
There are two formats.
ENV <key> <value>
ENV <key1>=<value1> <key2>=<value2>...
This directive is simple, it just sets the environment variables, either for other directives like
RUN or for runtime applications, you can use the environment variables defined here directly.
This example demonstrates how to break lines and enclose values containing spaces in double quotes, which is consistent with the behavior under
Once an environment variable is defined, it can then be used in subsequent commands. For example, in the official
Dockerfile, there is code like this.
The environment variable
NODE_VERSION is defined here first, and
$NODE_VERSION is used several times in the subsequent
RUN layer to customize the operation. As you can see, when you upgrade your image build in the future, you only need to update
Dockerfile build maintenance much easier.
The following directives can support environment variable expansion:
You can feel from this list of commands that environment variables can be used in many powerful places. Through environment variables, we can make one copy of
Dockerfile make more images, just by using different environment variables.
The format is.
VOLUME ["<path1>", "<path2>"...]
As we said before, container runtime should try to keep the container storage layer free from write operations. For database applications that need to save dynamic data, their database files should be saved in a volume, and we will further introduce the concept of Docker volume in later sections. In order to prevent users from forgetting to mount the directory where dynamic files are stored as a volume at runtime, in
Dockerfile, we can specify some directories to be mounted as anonymous volumes in advance, so that if users do not specify the mount at runtime, their applications can also run normally without writing a lot of data to the container storage layer.
/data directory here is automatically mounted as an anonymous volume at runtime, and any information written to
/data is not recorded into the container storage layer, thus ensuring statelessness of the container storage layer. Of course, this mount setting can be overridden at runtime. For example.
In this line, the named volume
mydata is mounted to the
/data location, replacing the anonymous volume mount configuration defined in the
The format is
EXPOSE <port 1> [<port 2>...].
EXPOSE directive is a declaration that the container provides a service port at runtime. This is just a declaration, and the application will not turn on services on this port at runtime because of this declaration. Writing such a declaration in
Dockerfile has two advantages: one is to help image users understand the daemon port of the image service to facilitate configuration mapping; the other is that when random port mapping is used at runtime, i.e.
docker run -P, the port of
EXPOSE is automatically mapped randomly.
In addition, there is a special use in earlier versions of
Docker. Previously, all containers ran on the default bridge network, so all containers had direct access to each other, which had some security issues. So there is a
-Docker engine parameter
-icc=false, when specified, containers will not be able to access each other by default, unless they use the
-links parameter, and only the ports declared by
EXPOSE in the image will be accessible. The use of
--icc=false has been largely eliminated with the introduction of
docker network, and interconnection and isolation between containers can be easily achieved with a custom network.
It is important to distinguish
-EXPOSE from the use of
-p <host port>:<container port> at runtime.
-p, which maps the host port to the container port, in other words, exposes the container’s corresponding port service to the outside world, while
EXPOSE simply declares what port the container intends to use, and does not automatically map the port at the host.
The format is
WORKDIR <working directory path>.
You can use the
WORKDIR command to specify the working directory (or called the current directory), and the current directory is changed to the specified directory at each subsequent level. If the directory does not exist,
WORKDIR will create it for you.
As mentioned before, some beginners often make the mistake of writing
Dockerfile as if it were a
Shell script, and this misunderstanding can also lead to errors like the following.
If you run this
Dockerfile as a build image, you will find that you cannot find the
/app/world.txt file, or its content is not
hello. The reason for this is simple: in the shell, two consecutive lines are in the same process execution environment, so the memory state modified by the previous command will directly affect the latter command; in
Dockerfile, the execution environment of the two
RUN commands is fundamentally different, and they are two completely different containers. This is a mistake caused by not understanding the concept of
Dockerfile building tiered storage.
As I said before, each
RUN starts a container, executes the command, and then commits the storage tier file changes. The execution of
RUN cd /app in the first tier is just a change in the working directory of the current process, a memory change, which does not result in any file changes. When you get to the second tier, you start a brand new container, which has nothing to do with the first tier container, so it is not possible to inherit the memory changes from the previous tier build process.
So if you need to change the location of the working directory of each subsequent layer, then you should use the