This article is a detailed explanation of Docker custom images, how to build your own Docker images, and the Dockerfile instructions.

I. Using Dockerfile to customize images

1.1, Dockerfile customization image

Customization of images is actually customizing the configuration and files added to each layer. If we can write a script for each layer to modify, install, build, and operate the commands, and use this script to build and customize the image, the problem of not being able to repeat, the problem of transparency of image construction, and the problem of volume will all be solved. This script is Dockerfile.

A Dockerfile is a text file that contains a set of instructions, each of which builds a layer, so the content of each instruction describes how that layer should be built.

Let’s take the nginx image as an example, this time we use Dockerfile to customize it.

In a blank directory, create a text file and name it Dockerfile.

1
2
3
$ mkdir mynginx
$ cd mynginx
$ touch Dockerfile

The contents are as follows.

1
2
FROM nginx
RUN echo '<h1>Hello, Docker!</h1>' > /usr/share/nginx/html/index.html

The Dockerfile is very simple, just two lines in total. It involves two directives, FROM and RUN.

1.2、FROM Specify the base image

The so-called custom image, that must be based on an image, on which to customize. Just like we ran a nginx image of the container before, and then modify it, the base image must be specified. And FROM is to specify the base image, so FROM is a required directive in a Dockerfile, and must be the first directive.

There are many high-quality official images on the Docker Store, including service images that can be used directly, such as nginx, redis, mongo, mysql, httpd, php, tomcat, etc. There are also images for developing, building, and running applications in various languages, such as node, openjdk, python, ruby, golang and so on. It is possible to find a image among them that best matches our ultimate goal as the base image for customization.

If you do not find a image that corresponds to the service, the official images also provide some more basic operating system images, such as ubuntu, debian, centos, fedora, alpine, etc. The software libraries of these operating systems provide us with a broader scope for expansion.

In addition to choosing existing images as the base image, Docker also has a special image called scratch. This image is a virtual concept and does not actually exist; it represents a blank image.

1
2
FROM scratch
...

If you use scratch as the base image, that means you don’t base it on any image, and the next instructions written will exist as the first layer of the image to begin with.

It is not uncommon to copy executables directly into images without any system base, e.g. swarm, coreos/etcd. For statically compiled programs on Linux, there is no need to have runtime support from the operating system, and all the libraries needed are already in the executable, so directly FROM scratch makes the image much smaller. Many applications developed in Go use this way to create images, which is one of the reasons why some people consider Go to be a particularly suitable language for container microservices architectures.

1.3、RUN Execute command

The RUN command is used to execute command line commands. Due to the power of the command line, the RUN command is one of the most common commands used when customizing images. It comes in two formats.

  • shell format: RUN <command>, which is like a command typed directly from the command line. The RUN command in Dockerfile that I just wrote is in this format.

    1
    
    RUN echo '<h1>Hello, Docker!</h1>' > /usr/share/nginx/html/index.html
    
  • exec format: RUN ["executable", "argument1", "argument2"], which is more like the format used in function calls.

Since RUN can execute commands just like Shell scripts, can we have a RUN for each command just like Shell scripts? For example, like this.

1
2
3
4
5
6
7
8
9
FROM debian:jessie

RUN apt-get update
RUN apt-get install -y gcc libc6-dev make
RUN wget -O redis.tar.gz "http://download.redis.io/releases/redis-3.2.5.tar.gz"
RUN mkdir -p /usr/src/redis
RUN tar -xzf redis.tar.gz -C /usr/src/redis --strip-components=1
RUN make -C /usr/src/redis
RUN make -C /usr/src/redis install

As I said before, every command in Dockerfile creates a layer, and RUN is no exception. The behavior of each RUN is the same as the process we just used to create the image manually: create a new layer, execute the commands on it, and after that, commit the changes on that layer to form a new image.

The way it’s written above, it creates 7 layers of images. This is completely pointless, and a lot of things that are not needed at runtime are loaded into the image, such as compiled environments, updated packages, and so on. The result is a very bloated, multi-layered image that not only increases the time to build and deploy, but is also error-prone. This is a common mistake that many people who are new to Docker make.

There is a maximum number of layers in Union FS, such as AUFS, which used to have a maximum of 42 layers and now has a maximum of 127 layers.

The correct way to write the above Dockerfile would be as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
FROM debian:jessie

RUN buildDeps='gcc libc6-dev make' \
    && apt-get update \
    && apt-get install -y $buildDeps \
    && wget -O redis.tar.gz "http://download.redis.io/releases/redis-3.2.5.tar.gz" \
    && mkdir -p /usr/src/redis \
    && tar -xzf redis.tar.gz -C /usr/src/redis --strip-components=1 \
    && make -C /usr/src/redis \
    && make -C /usr/src/redis install \
    && rm -rf /var/lib/apt/lists/* \
    && rm redis.tar.gz \
    && rm -r /usr/src/redis \
    && apt-get purge -y --auto-remove $buildDeps

First, all the previous commands have only one purpose, to compile and install the Redis executable. So there is no need to create many layers, this is just one layer. So, instead of using many RUN pairs corresponding to different commands, there is just one RUN command, and && to concatenate all the required commands. This simplifies the previous 7 layers to 1 layer. When writing a Dockerfile, always remind yourself that you are not writing a Shell script, but rather defining how each layer should be built.

And, there are line breaks for formatting purposes. Dockerfile supports a command line feed with \ at the end of the line for Shell classes, and a comment format with # at the beginning of the line. Good formatting, such as line breaks, indentation, comments, etc., will make maintenance and troubleshooting easier, which is a better habit.

Also, you can see the cleanup command added at the end of this set of commands, which removes the software needed in order to compile the build, cleans up all downloaded and expanded files, and also cleans up the apt cache file. This is a very important step, as we said before, images are multi-layer storage, and things on each layer are not deleted at the next layer, they stay with the image. So when building the image, make sure that you only add what you really need to add at each layer, and that anything extraneous is cleaned up.

One of the reasons why many people who are new to Docker create bloated images is that they forget to clean up extraneous files at the end of each build.

1.4. Building the image

Let’s go back to the Dockerfile of the custom Nginx image we made earlier. Now that we understand the contents of the Dockerfile, let’s build the image.

Execute the following command in the directory where the Dockerfile file is located.

1
2
3
4
5
6
7
8
9
$ docker build -t nginx:v3 .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM nginx
 ---> e43d811ce2f4
Step 2 : RUN echo '<h1>Hello, Docker!</h1>' > /usr/share/nginx/html/index.html
 ---> Running in 9cdc27646c7b
 ---> 44aa4490ce2c
Removing intermediate container 9cdc27646c7b
Successfully built 44aa4490ce2c

From the output of the command, we can clearly see how the image was built. In Step 2, as we said before, the RUN command starts a container 9cdc27646c7b, executes the requested command, and finally commits the layer 44aa4490ce2c, and then deletes the used container 9cdc27646c7b.

Here we used the docker build command to build the image. The format is.

1
docker build [options] <context path/URL/->

Here we specify the name of the final image -t nginx:v3, and after a successful build, we can run this image as we did nginx:v2 before, and the result will be the same as nginx:v2.

1.5, image build context (Context)

If you pay attention, you will see that the docker build command ends with a ., . means the current directory, and Dockerfile is in the current directory, so many beginners think that this path is specifying the path where Dockerfile is located, which is actually inaccurate. If you look at the command format above, you will probably find that it is specifying context path. So what is context?

First we need to understand how docker build works. Docker is divided at runtime into the Docker engine (also known as the server daemon) and the client tools. The Docker engine provides a set of REST APIs, called the Docker Remote API, and client tools like the docker command interact with the Docker engine through this set of APIs to perform various functions. So, although it seems that we are executing various docker functions locally, in reality, everything is done on the server side (the Docker engine) using remote calls. This C/S design also makes it easy to manipulate the Docker engine on the remote server.

When we build an image, not all customizations are done with the RUN command, but often some local files are copied into the image, for example, with the COPY command, the ADD command, and so on. The docker build command builds the image, not locally, but on the server side, i.e. in the Docker engine. So in this client/server architecture, how can the server get the local files?

This introduces the concept of context. When building, the user specifies the path to the build image context, and the docker build command learns this path, packages everything under it, and uploads it to the Docker engine. Once the Docker engine receives the context package, it expands it and gets all the files it needs to build the image.

If you write this in the Dockerfile.

1
COPY ./package.json /app/

This is not a copy of package.json in the directory where the docker build command was executed, nor is it a copy of package.json in the directory where Dockerfile is located, but a copy of package.json in the context directory.

Therefore, the paths to the source files in commands like COPY are *relative paths. This is why beginners often ask why COPY ... /package.json /app or COPY /opt/xxxx /app does not work, because those paths are out of context and the Docker engine cannot get the files in those locations. If you really need those files, you should copy them to the context directory.

Now you can understand the command docker build -t nginx:v3 . in this ., you are actually specifying the context directory where the docker build command will package the contents of that directory to the Docker engine to help build the image.

If we look at the docker build output, we have actually seen this process of sending a context.

1
2
3
$ docker build -t nginx:v3 .
Sending build context to Docker daemon 2.048 kB
...

Understanding the build context is important for image building to avoid making mistakes you shouldn’t make. For example, some beginners find that COPY /opt/xxxx /app doesn’t work, so they simply put Dockerfile in the root of their hard drive to build it, only to find that docker build executes and sends a few dozen GB of stuff, which is extremely slow and prone to build failure. That’s because this approach is asking docker build to pack the entire hard drive, which is clearly a misuse.

In general, you should put Dockerfile in an empty directory, or in the root of the project. If there are no required files in that directory, then you should make a copy of the required files. If there are things in the directory that you really don’t want to pass to the Docker engine at build time, then you can write a .dockerignore with the same syntax as .gitignore, which is used to weed out files that don’t need to be passed to the Docker engine as context.

So why would anyone mistakenly think that . is to specify the directory where the Dockerfile is located? This is because by default, if you don’t specify Dockerfile additionally, a file named Dockerfile in the context directory will be used as the Dockerfile.

This is only the default behavior, in fact the filename of Dockerfile is not required to be Dockerfile, and it is not required to be located in the context directory, for example you can use -f . /Dockerfile.php parameter to specify a file as a Dockerfile.

Of course, it is customary to use the default filename Dockerfile and to place it in the image build context directory.

1.6. Other uses of docker build

1.6.1. Building directly from the Git repo

docker build also supports building from a URL, for example, you can build directly from the Git repo.

1
2
3
4
5
6
7
8
$ docker build https://github.com/twang2218/gitlab-ce-zh.git#:8.14
docker build https://github.com/twang2218/gitlab-ce-zh.git\#:8.14
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM gitlab/gitlab-ce:8.14.0-ce.0
8.14.0-ce.0: Pulling from gitlab/gitlab-ce
aed15891ba52: Already exists
773ae8583d14: Already exists
...

This command specifies the Git repo required for the build, and specifies the default master branch and the build directory as /8.14/, then Docker will go to the git clone project itself, switch to the specified branch, and go to the specified directory and start the build.

1.6.2. Build with the given tarball

1
$ docker build http://server/context.tar.gz

If the URL given is not a Git repo but a tar archive, then the Docker engine will download the archive, unpack it automatically, and use it as a context to start the build.

1.6.3. Reading a Dockerfile from standard input for a build

1
docker build - < Dockerfile

or

1
cat Dockerfile | docker build -

If the standard input is passed in as a text file, it is treated as a Dockerfile and the build begins. This form has no context since it reads the contents of the Dockerfile directly from the standard input, so it is not possible to do things like COPY the local file into the image like other methods can.

1.6.4, read the contextual zip package from the standard input for construction

1
$ docker build - < context.tar.gz

If the standard input file format is gzip, bzip2 and xz, it will be made a contextual archive, expand it directly, treat it as a context, and start building.

II. Dockerfile directives

We have already introduced FROM, RUN, and also mentioned COPY, ADD, in fact Dockerfile is very powerful, it provides more than ten directives. Let’s continue to explain the other directives.

2.1, COPY

Format.

  • COPY <source path>... <target path>

  • COPY ["<source path1>",... "<target path>"]

Like the RUN command, there are two formats, one similar to a command line and one similar to a function call.

The COPY command copies files/directories from the <source path> in the build context directory to the <target path> location within the image of the new layer. For example.

1
COPY package.json /usr/src/app/

<source path> can be multiple, or even wildcards, with wildcard rules that satisfy Go’s filepath.Match rule, e.g.

1
2
COPY hom* /mydir/
COPY hom?.txt /mydir/

<target path> can be either an absolute path within the container or a relative path to the working directory (the working directory can be specified with the WORKDIR command). The target path does not need to be created beforehand, if the directory does not exist, the missing directory will be created before copying the file.

It is also important to note that with the COPY command, all metadata of the source file is preserved. For example, read, write, execute permissions, file change time, etc. This feature is useful for image customization. Especially if the build-related files are being managed using Git.

2.2、ADD

The format and nature of the ADD command is basically the same as that of COPY. But it adds some features to COPY.

For example, <source path> can be a URL, in which case the Docker engine will try to download the linked file to <destination path>. The downloaded file permissions are automatically set to 600, and if this is not the desired permissions, then an additional layer of RUN will be added to adjust the permissions. So it makes more sense to just use the RUN command and then use the wget or curl tool to download, handle permissions, unzip, and then clean up the useless files. Therefore, this feature is not really practical and is not recommended.

If <source> is a tar zip file in gzip, bzip2 or xz format, the ADD command will automatically decompress the zip file to <destination>.

This is useful in some cases, such as in the official image ubuntu.

1
2
3
FROM scratch
ADD ubuntu-xenial-core-cloudimg-amd64-root.tar.gz /
...

However, in some cases, if we really want to copy a zip file without unzipping it, we can’t use the ADD command.

The official Dockerfile best practices document asks to use COPY whenever possible, because the semantics of COPY are clear: it’s just copying a file, while ADD contains more complex functionality and its behavior is not always clear. The most suitable situation for using ADD is the one mentioned, where automatic decompression is required.

Also note that the ADD command will invalidate the image build cache, which may make image builds slower.

Therefore, when choosing between the COPY and ADD directives, you can follow the principle of using the COPY directive for all file copying, and using ADD only when automatic decompression is required.

2.3, CMD

The format of the CMD command is similar to that of RUN, which is also in two formats.

  • shell format: CMD <command>
  • exec format: CMD ["executable", "parameter1", "parameter2"...]
  • Parameter list format: CMD ["parameter1", "parameter2"...]. After specifying the ENTRYPOINT directive, specify the specific parameters with CMD.

As we said before when introducing containers, Docker is not a virtual machine, containers are processes. Since it is a process, when you start the container, you need to specify the program and parameters to run. The CMD command is used to specify the default container main process start command.

For example, the default CMD for the ubuntu image is /bin/bash. If we run docker run -it ubuntu, we will go directly to bash. We can also specify another command to run at runtime, such as docker run -it ubuntu cat /etc/os-release. This replaces the default /bin/bash command with the cat /etc/os-release command, which outputs the system version information.

In terms of command format, the exec format is recommended. This format will be parsed as a JSON array, so be sure to use double quotes " instead of single quotes.

If you use the shell format, the actual command will be wrapped as a sh -c argument. For example.

1
CMD echo $HOME

In the actual implementation, this will be changed to

1
CMD [ "sh", "-c", "echo $HOME" ]

This is why we can use environment variables, because they are parsed by the shell.

Speaking of CMD, we have to mention the issue of foreground and background execution of applications in containers. This is a common confusion for beginners.

Docker is not a virtual machine, applications in containers should be executed in the foreground, not like virtual machines or physical machines, using upstart/systemd to start background services, there is no concept of background services in containers.

Some beginners write CMD as.

1
CMD service nginx start

Then I found that the container exited immediately after execution. Even inside the container to use the systemctl command but found that it does not execute at all. This is because they do not understand the concept of foreground and background, do not distinguish the difference between containers and virtual machines, still in the traditional virtual machine perspective to understand the container.

For the container, its startup program is the container application process, the container is for the main process and exists, the main process exit, the container will lose the meaning of existence, and thus exit, other auxiliary processes are not something it needs to care about.

Using the service nginx start command, you want upstart to start the nginx service as a background daemon. And as I said earlier CMD service nginx start will be interpreted as CMD ["sh", "-c", "service nginx start"], so the primary process is actually sh. Then when the service nginx start command finishes, sh also finishes, and sh exits as the master process, which naturally causes the container to exit.

The correct way to do this is to execute the nginx executable directly and require it to be run as a foreground. For example.

1
CMD ["nginx", "-g", "daemon off;"]

2.4, ENTRYPOINT

The format of ENTRYPOINT is the same as that of the RUN command, which is divided into exec format and shell format.

The purpose of ENTRYPOINT is the same as CMD, it is to start the program and parameters in the specified container. ENTRYPOINT can also be substituted at runtime, but is slightly more cumbersome than CMD and needs to be specified via the -entrypoint argument to docker run.

When ENTRYPOINT is specified, the meaning of CMD changes and instead of running its command directly, the contents of CMD are passed as an argument to the ENTRYPOINT command, in other words, when actually executed, it becomes.

1
<ENTRYPOINT> "<CMD>"

So why do we need ENTRYPOINT after we have CMD? Is there any benefit to this <ENTRYPOINT> "<CMD>"? Let’s look at a few scenarios.

2.4.1, Scenario 1: Make the image look like a command

Suppose we need a image that knows our current public IP, then we can start with CMD to achieve.

1
2
3
4
5
FROM ubuntu:16.04
RUN apt-get update \
    && apt-get install -y curl \
    && rm -rf /var/lib/apt/lists/*
CMD [ "curl", "-s", "http://ip.cn" ]

If we use docker build -t myip . to build the image, if we need to query the current public IP, we just need to run

1
2
$ docker run myip
....

So it looks like we can use the image as a command, but there are always parameters to the command, what if we want to add parameters? For example, as you can see from the CMD above, the actual command is curl, so if we want to display HTTP headers, we need to add the -i argument. Can we just add the -i argument to docker run myip?

1
2
$ docker run myip -i
docker: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"-i\\\": executable file not found in $PATH\"\n".

We can see the error that the executable file is not found, executable file not found. As we said before, the image name is followed by command, which replaces the default value of CMD when run. So here -i replaces the original CMD, instead of being added after the original curl -s http://ip.cn. And -i is not a command at all, so naturally it is not found.

So if we want to add the -i parameter, we have to retype the command in its entirety.

1
$ docker run myip curl -s http://ip.cn -i

This is obviously not a very good solution, and using ENTRYPOINT solves the problem. Now let’s reuse ENTRYPOINT to implement this image.

1
2
3
4
5
FROM ubuntu:16.04
RUN apt-get update \
    && apt-get install -y curl \
    && rm -rf /var/lib/apt/lists/*
ENTRYPOINT [ "curl", "-s", "http://ip.cn" ]

This time let’s try it again directly with docker run myip -i.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ docker run myip
...

$ docker run myip -i
HTTP/1.1 200 OK
Server: nginx/1.8.0
Date: Tue, 22 Nov 2016 05:12:40 GMT
Content-Type: text/html; charset=UTF-8
Vary: Accept-Encoding
X-Powered-By: PHP/5.6.24-1~dotdeb+7.1
X-Cache: MISS from cache-2
X-Cache-Lookup: MISS from cache-2:80
X-Cache: MISS from proxy-2_6
Transfer-Encoding: chunked
Via: 1.1 cache-2:80, 1.1 proxy-2_6:8006
Connection: keep-alive

...

As you can see, it worked this time. This is because when ENTRYPOINT exists, the contents of the CMD will be passed as an argument to ENTRYPOINT, and here -i is the new CMD, so it will be passed as an argument to curl, thus achieving the desired effect.

2.4.2, Scenario 2: Preparations before running the application

Starting the container is to start the main process, but there are times when some preparatory work is needed before starting the main process.

For example, a mysql class database may require some database configuration, initialization work that has to be solved before the final mysql server can be run.

In addition, you may want to avoid using the root user to start the service to improve security, and you may need to perform some necessary preparation work as root before starting the service, and then switch to the service user to start the service. In addition to the service, other commands can still be executed as root to facilitate debugging, etc.

These preparations are not related to the container CMD, no matter what CMD is, a pre-processing work is needed beforehand. In this case, you can write a script and put it in ENTRYPOINT, which will take the received parameters (i.e. <CMD>) as commands and execute them at the end of the script. This is how it is done in the official image redis, for example.

1
2
3
4
5
6
7
8
FROM alpine:3.4
...
RUN addgroup -S redis && adduser -S -G redis redis
...
ENTRYPOINT ["docker-entrypoint.sh"]

EXPOSE 6379
CMD [ "redis-server" ]

You can see that the Redis user is created for the Redis service, and the ENTRYPOINT is specified at the end for the docker-entrypoint.sh script.

1
2
3
4
5
6
7
8
9
#!/bin/sh
...
# allow the container to be started with `--user`
if [ "$1" = 'redis-server' -a "$(id -u)" = '0' ]; then
 chown -R redis .
 exec su-exec redis "$0" "$@"
fi

exec "$@"

The script is based on the contents of the CMD, if it is redis-server then switch to the redis user identity to start the server, otherwise it will continue to use the root identity. For example.

1
2
$ docker run -it redis id
uid=0(root) gid=0(root) groups=0(root)

2.5, ENV

There are two formats.

  • ENV <key> <value>
  • ENV <key1>=<value1> <key2>=<value2>...

This directive is simple, it just sets the environment variables, either for other directives like RUN or for runtime applications, you can use the environment variables defined here directly.

1
2
ENV VERSION=1.0 DEBUG=on \
    NAME="Happy Feet"

This example demonstrates how to break lines and enclose values containing spaces in double quotes, which is consistent with the behavior under Shell.

Once an environment variable is defined, it can then be used in subsequent commands. For example, in the official node image Dockerfile, there is code like this.

1
2
3
4
5
6
7
8
9
ENV NODE_VERSION 7.2.0

RUN curl -SLO "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-x64.tar.xz" \
  && curl -SLO "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc" \
  && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \
  && grep " node-v$NODE_VERSION-linux-x64.tar.xz\$" SHASUMS256.txt | sha256sum -c - \
  && tar -xJf "node-v$NODE_VERSION-linux-x64.tar.xz" -C /usr/local --strip-components=1 \
  && rm "node-v$NODE_VERSION-linux-x64.tar.xz" SHASUMS256.txt.asc SHASUMS256.txt \
  && ln -s /usr/local/bin/node /usr/local/bin/nodejs

The environment variable NODE_VERSION is defined here first, and $NODE_VERSION is used several times in the subsequent RUN layer to customize the operation. As you can see, when you upgrade your image build in the future, you only need to update 7.2.0, making Dockerfile build maintenance much easier.

The following directives can support environment variable expansion: ADD, COPY, ENV, EXPOSE, LABEL, USER, WORKDIR, VOLUME, STOPSIGNAL, ONBUILD.

You can feel from this list of commands that environment variables can be used in many powerful places. Through environment variables, we can make one copy of Dockerfile make more images, just by using different environment variables.

2.6. VOLUME

The format is.

  • VOLUME ["<path1>", "<path2>"...]

  • VOLUME <path>

As we said before, container runtime should try to keep the container storage layer free from write operations. For database applications that need to save dynamic data, their database files should be saved in a volume, and we will further introduce the concept of Docker volume in later sections. In order to prevent users from forgetting to mount the directory where dynamic files are stored as a volume at runtime, in Dockerfile, we can specify some directories to be mounted as anonymous volumes in advance, so that if users do not specify the mount at runtime, their applications can also run normally without writing a lot of data to the container storage layer.

1
VOLUME /data

The /data directory here is automatically mounted as an anonymous volume at runtime, and any information written to /data is not recorded into the container storage layer, thus ensuring statelessness of the container storage layer. Of course, this mount setting can be overridden at runtime. For example.

1
docker run -d -v mydata:/data xxxx

In this line, the named volume mydata is mounted to the /data location, replacing the anonymous volume mount configuration defined in the Dockerfile.

2.7. EXPOSE

The format is EXPOSE <port 1> [<port 2>...].

The EXPOSE directive is a declaration that the container provides a service port at runtime. This is just a declaration, and the application will not turn on services on this port at runtime because of this declaration. Writing such a declaration in Dockerfile has two advantages: one is to help image users understand the daemon port of the image service to facilitate configuration mapping; the other is that when random port mapping is used at runtime, i.e. docker run -P, the port of EXPOSE is automatically mapped randomly.

In addition, there is a special use in earlier versions of Docker. Previously, all containers ran on the default bridge network, so all containers had direct access to each other, which had some security issues. So there is a -Docker engine parameter -icc=false, when specified, containers will not be able to access each other by default, unless they use the -links parameter, and only the ports declared by EXPOSE in the image will be accessible. The use of --icc=false has been largely eliminated with the introduction of docker network, and interconnection and isolation between containers can be easily achieved with a custom network.

It is important to distinguish -EXPOSE from the use of -p <host port>:<container port> at runtime. -p, which maps the host port to the container port, in other words, exposes the container’s corresponding port service to the outside world, while EXPOSE simply declares what port the container intends to use, and does not automatically map the port at the host.

2.8, WORKDIR

The format is WORKDIR <working directory path>.

You can use the WORKDIR command to specify the working directory (or called the current directory), and the current directory is changed to the specified directory at each subsequent level. If the directory does not exist, WORKDIR will create it for you.

As mentioned before, some beginners often make the mistake of writing Dockerfile as if it were a Shell script, and this misunderstanding can also lead to errors like the following.

1
2
UN cd /app
RUN echo "hello" > world.txt

If you run this Dockerfile as a build image, you will find that you cannot find the /app/world.txt file, or its content is not hello. The reason for this is simple: in the shell, two consecutive lines are in the same process execution environment, so the memory state modified by the previous command will directly affect the latter command; in Dockerfile, the execution environment of the two RUN commands is fundamentally different, and they are two completely different containers. This is a mistake caused by not understanding the concept of Dockerfile building tiered storage.

As I said before, each RUN starts a container, executes the command, and then commits the storage tier file changes. The execution of RUN cd /app in the first tier is just a change in the working directory of the current process, a memory change, which does not result in any file changes. When you get to the second tier, you start a brand new container, which has nothing to do with the first tier container, so it is not possible to inherit the memory changes from the previous tier build process.

So if you need to change the location of the working directory of each subsequent layer, then you should use the WORKDIR command.