A Dockerfile is the starting point for creating a Docker image, which provides a well-defined set of instructions that allow us to copy files or folders, run commands, set environment variables, and perform other tasks needed to create a container image. It is important to write Dockerfile to ensure that the generated images are safe, small, fast to build and fast to update.

In this article we’ll see how to write good Dockerfiles to speed up the development process, ensure build reusability, and generate images that are safe to deploy to production.

Development Process

As developers, we want to match our development environment as closely as possible to our production environment to ensure that the content we build will work when deployed.

We also want to be able to develop quickly, which means we want builds to be fast and we also want to be able to use development tools like debuggers. Containers are a great way to organize our development environment, but we need to define our Dockerfile properly to be able to interact with our containers quickly.

Incremental builds

A Dockerfile is a declarative manifest for building container images. While the Docker builder caches the results of each step as an image layer, the cache may be invalidated, resulting in the step that invalidated the cache and all subsequent steps needing to be rerun and the corresponding layer regenerated.

The cache is invalidated when the COPY or ADD reference to a file in the build context changes. So the order of the build steps may have a very significant impact on the performance of the build.

Let’s look at an example of building a NodeJs project in Dockerfile. In this project, a number of dependencies are specified in the package.json file, which are fetched when the npm ci command is run.

The simplest Dockerfile file looks like this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
FROM node:lts

ENV CI=true
ENV PORT=3000

WORKDIR /code
COPY . /code
RUN npm ci

CMD [ "npm", "start" ]

Whenever a file in the build context changes, building the Dockerfile as described above will cause the cache to be invalidated on the COPY line. This means that any changes to any file other than the package.json file, which will take a long time, will be re-fetched and placed in the node_modules directory.

To avoid this and only re-fetching dependencies when they change (i.e., when package.json or package-lock.json changes), we should consider separating the dependency installation from the build and run of the application.

The optimized Dockerfile looks like this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
FROM node:lts

ENV CI=true
ENV PORT=3000

WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src

CMD [ "npm", "start" ]

Using this separation, if the package.json or package-lock.json file is not changed, the cache will be used for the layer generated by the RUN npm ci command. This means that when we edit the application source code and rebuild it, we won’t have to re-download the dependencies, saving a lot of time 🎉.

Keep live loading between host and container

This tip is not directly related to Dockerfile, but we often hear the question: How to keep the code hot updated when running the application in the container and modifying the source code from the IDE on the host?

In our example here, we need to mount our project directory to the container and pass an environment variable to enable Chokidar, which encapsulates the NodeJS file change events. Run the command as shown below.

1
$ docker run -e CHOKIDAR_USEPOLLING=true  -v ${PWD}/src/:/code/src/ -p 3000:3000 repository/image_name

Here we mount the code directory on top of the host to the container via -v, and any changes to the code on the host will be loaded and updated in the container in real time.

Build consistency

One of the most important things about Dockerfile is to build identical images from the same build context (sources, dependencies, …).

Here we will continue to improve the Dockerfile defined in the previous section.

Consistent builds from source

As described in the previous section, we can build applications by adding source files and dependencies to the Dockerfile description and running commands on them.

But in the previous example, we can’t actually confirm that the generated image is the same every time we run docker build, why? Because every time NodeJS is released, the lts tag points to the latest LTS version of the NodeJS image, which changes over time and can bring about significant changes. So we can easily solve this problem by using a deterministic tag for the base image. As follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
FROM node:13.12.0

ENV CI=true
ENV PORT=3000

WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src

CMD [ "npm", "start" ]

In the following we will also see that there are other advantages to using label-specific base images.

Multi-phase and matching the right environment

We are consistent for development builds, but how do we do this for production environments?

Starting with Docker 17.05, we can use multi-stage builds to define the steps to generate the final image. Using this mechanism in Dockerfile, we can separate the images used for the development process from those used for the production environment, as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
FROM node:13.12.0 AS development

ENV CI=true
ENV PORT=3000

WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src

CMD [ "npm", "start" ]

FROM development AS builder

RUN npm run build

FROM nginx:1.17.9 AS production

COPY –from=builder /code/build /usr/share/nginx/html

When we see a directive like FROM ...... AS we can know it is a multi-build phase. We now have 3 phases: development, build and production. By using the --target tag to build the image for a specific development phase, we can continue to use the container for our development process.

1
$ docker build –target development -t repository/image_name:development .

The same can also be run like this.

1
$ docker run -e CHOKIDAR_USEPOLLING=true -v ${PWD}/src/:/code/src/ repository/image_name:development

A docker build without the -target flag will build the final stage, which in our case is the production image. Our production image is just an nginx image, where the files built in the previous steps are placed in the corresponding locations.

Preparing for production

It is important to keep the production image as lean and secure as possible. Here are a few things to check before running containers in production.

No more recent image versions

As we said earlier, the build step using a specific tag helps to make the generation of mirrors unique. There are also at least two very good reasons to use specific tags for mirrors.

  • It is easy to find all containers running with a mirrored version in a container orchestration system (Swarm, Kubernetes…).
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Search in Docker engine containers using our repository/image_name:development image

$ docker inspect $(docker ps -q) | jq -c ‘.[] | select(.Config.Image == "repository/image_name:development") |"\(.Id) \(.State) \(.Config)"‘

"89bf376620b0da039715988fba42e78d42c239446d8cfd79e4fbc9fbcc4fd897 {\"Status\":\"running\",\"Running\":true,\"Paused\":false,\"Restarting\":false,\"OOMKilled\":false,\"Dead\":false,\"Pid\":25463,\"ExitCode\":0,\"Error\":\"\",\"StartedAt\":\"2020-04-20T09:38:31.600777983Z\",\"FinishedAt\":\"0001-01-01T00:00:00Z\"}
{\"Hostname\":\"89bf376620b0\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":true,\"AttachStderr\":true,\"ExposedPorts\":{\"3000/tcp\":{}},\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"CHOKIDAR_USEPOLLING=true\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\",\"NODE_VERSION=12.16.2\",\"YARN_VERSION=1.22.4\",\"CI=true\",\"PORT=3000\"],\"Cmd\":[\"npm\",\"start\"],\"Image\":\"repository/image_name:development\",\"Volumes\":null,\"WorkingDir\":\"/code\",\"Entrypoint\":[\"docker-entrypoint.sh\"],\"OnBuild\":null,\"Labels\":{}}"

#Search in k8s pods running a container with our repository/image_name:development image (using jq cli)
$ kubectl get pods –all-namespaces -o json | jq -c ‘.items[] | select(.spec.containers[].image == "repository/image_name:development")| .metadata’

{"creationTimestamp":"2020-04-10T09:41:55Z","generateName":"image_name-78f95d4f8c-","labels":{"com.docker.default-service-type":"","com.docker.deploy-namespace":"docker","com.docker.fry":"image_name","com.docker.image-tag":"development","pod-template-hash":"78f95d4f8c"},"name":"image_name-78f95d4f8c-gmlrz","namespace":"docker","ownerReferences":[{"apiVersion":"apps/v1″,"blockOwnerDeletion":true,"controller":true,"kind":"ReplicaSet","name":"image_name-78f95d4f8c","uid":"5ad21a59-e691-4873-a6f0-8dc51563de8d"}],"resourceVersion":"532″,"selfLink":"/api/v1/namespaces/docker/pods/image_name-78f95d4f8c-gmlrz","uid":"5c70f340-05f1-418f-9a05-84d0abe7009d"}
  • For CVE (Common Vulnerabilities and Disclosures), we can quickly know if we need to patch containers and images. In our example here, we can specify that our development and production images use the alpine version.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
FROM node:13.12.0-alpine AS development

ENV CI=true
ENV PORT=3000

WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src

CMD [ "npm", "start" ]

FROM development AS builder

RUN npm run build

FROM nginx:1.17.9-alpine

COPY –from=builder /code/build /usr/share/nginx/html

Using Official Mirrors

You can use Docker Hub to search for base mirrors to use in Dockerfile, some of which are officially supported mirrors. We strongly recommend using these mirrors.

  • Their content has been verified
  • They will be updated quickly after fixing CVEs

You can add the image_filter request query parameter to get the official version of the image.

1
https://hub.docker.com/search?q=nginx&type=image&image_filter=official

The examples we used above use the official images of NodeJS and NGINX.

Enough privileges!

All applications, whether running in containers or not, should adhere to the principle of least privilege, which means that applications should only access the resources they need.

Processes running with too many privileges may have unintended consequences for the entire system at runtime if malicious behavior or errors occur.

Configuring the image itself with an unprivileged user identity is also very simple:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
FROM maven:3.6.3-jdk-11 AS builder
WORKDIR /workdir/server
COPY pom.xml /workdir/server/pom.xml
RUN mvn dependency:go-offline

RUN mvn package

FROM openjdk:11-jre-slim
RUN addgroup -S java && adduser -S javauser -G java
USER javauser

EXPOSE 8080
COPY –from=builder /workdir/server/target/project-0.0.1-SNAPSHOT.jar /project-0.0.1-SNAPSHOT.jar

CMD ["java", "-Djava.security.egd=file:/dev/./urandom", "-jar", "/project-0.0.1-SNAPSHOT.jar"]

Simply create a new group, add a user to it, and then use the USER command and we can run the container with a non-root user.

Conclusion

In this article we have shown just some of the many ways to optimize and protect Docker images by making a Dockerfile. If you want to learn more about how to do this, you can check out some of the following sources.