With the rise of Docker, more and more projects use Docker to build production environments, because the container is light enough to quickly start and migrate business services, but in the process of using, we can easily ignore the project security issues, although the container has the role of isolation, but we know that he and the virtual machine architecture gap is still relatively large.

Virtual machine by adding Hypervisor layer, virtualized Network card, memory, CPU and other virtual hardware, and then build virtual machines on it, each virtual machine has its own system kernel. The Docker container is supported by the kernel to isolate the file system, processes, devices, network and other resources, and then control the permissions, CPU resources, etc., so that the containers do not affect each other. However, containers share resources such as kernel, file system, hardware, etc. with the host.

Build configuration

Check the image file

When we customize the build environment, we need to choose the base image, docker pull image:tag, make sure to choose the official image of Docker to reduce the risk of victimization, when choosing the image, we generally give priority to the base version of Alpine Linux, which is a streamlined distribution, it is light enough, after cutting, the system The performance of the system is so much better, and at the same time, the victimization of the system can be reduced.

Do I need to use the latest or a fixed tag version?

For example.

1
2
3
4
5
6
7
python:3.9.6-alpine3.14

python:3.9.6-alpine

python:3.9-alpine

python:alpine

If you choose a definite version, you can avoid being affected by subsequent image changes. On the other hand, using the latest version will ensure that more vulnerabilities are patched. This is a trade-off, but it is usually recommended to fix to a stable version. Generally smaller versions are optimized for stable versions and will ensure backwards compatibility without major changes, with this in mind I would choose python:3.9-alpine.

This choice is usually also applicable to our selection of software versions for production environments.

Always use unprivileged users

By default, processes inside the container run as root (id=0).

To enforce the principle of least privilege, we should set a default user. There are two options.

  1. Use the following option to specify an arbitrary user ID that does not exist in the running container -u

    1
    
    docker run -u 4000 <image>
    

    Note: If we need to mount the filesystem later, we should match the user ID we use with the host user in order to access the files.

  2. By creating a default user in Dockerfile

    It is more common, for example nginx official Dockerfile

    1
    2
    3
    4
    5
    6
    7
    8
    
    FROM <base image>
    
    RUN addgroup -S appgroup \
    && adduser -S appuser -G appgroup
    
    USER appuser
    
    ... <rest of Dockerfile> ...
    

Using a separate user ID namespace

By default, the Docker daemon uses the server’s user ID namespace. Therefore, any successful elevation of privileges within a container also implies root access to the server and other containers. To reduce this risk, we should configure the server and the Docker daemon with different users and groups.

1
dockerd --userns-remap=testuser:testuser

Don’t expose Docker daemon sockets

Unless you are very sure of what you are doing, never expose the UNIX socket that Docker is listening on: /var/run/docker.sock.

This is the main entry point for the Docker API. Granting someone access is the same as granting root privileges to your server.

Try to avoid the following actions.

1
-v /var/run/docker.sock://var/run/docker.sock

Privileged capabilities and shared resources

First of all, a container should never run as privileged, otherwise it has the root privileges of the host. To be more secure, it is recommended to explicitly disable the possibility of adding new privileges after creating a container using the option -security-opt=no-new-privileges , a security option that prevents application processes inside the container from acquiring new privileges during execution .

Do not share sensitive parts of the host filesystem.

  • root (/),
  • devices (/dev)
  • process (/proc)
  • Virtual (/sys) mount points.

If you need to access server devices, be careful to use the [r|w|m] flag (read, write and use mknod) to selectively enable access options.

Using control groups to limit access to resources

Control groups are mechanisms used to control each container’s access to CPU, memory, disk I/O.

We should avoid sharing resources with the host, otherwise the server may be at risk of a DoS attack. It is recommended to specify memory and CPU usage with the following options.

1
2
3
4
5
6
7
--memory="400m"
--memory-swap="1g"

--cpus=0.5
--restart=on-failure:5
--ulimit nofile=5
--ulimit nproc=5

File system

Use a temporary file system for non-persistent data

If you need only temporary storage, use the appropriate option.

1
docker run --read-only --tmpfs /tmp:rw ,noexec,nosuid <image>

Using the file system to store persistent data

If data needs to be shared with the host filesystem or other containers, there are two options.

  • Create a bound mount with limited available disk space (--mount type=bind,o=size)
  • Create bound volumes for dedicated partitions (--mount type=volume)

In either case, if the container does not need to modify shared data, use the read-only option.

1
docker run -v <volume-name>:/path/in/container:ro <image>

or

1
docker run --mount source=<volume-name>,destination=/path/in/container,readonly <image>

Networking

Do not use Docker’s default bridge docker0

docker0 is a bridge created at boot time to separate the host network from the container network. When creating containers, Docker uses docker0 to connect to the network by default. Therefore, all containers are connected to each other with docker0 and can communicate with each other. We should disable the default connection for all containers by specifying the option --bridge=none and then create a private network for each connection using the following command.

1
docker network create <network_name>

Use it to access the host network interface.

1
docker run --network=<network_name>

Do not share the network namespace of the host

In the same sense, isolate the network interface of the host: --network=host The share with server option should not be used.

Open source container vulnerability scanning tools

After building the container you can use some static container detection tools to see if there are any bugs found, for example.