Health check for docker containers

Since version 1.12, Docker has introduced a native health check implementation. The simplest health check for containers is the process-level health check, which verifies whether a process is alive or not; Docker Daemon automatically monitors the PID1 process in the container and can restart the ended container according to the restart policy if specified in the docker run command. In many practical scenarios, it is not enough to use the process-level health check mechanism. For example, if a container process is still running but cannot continue to respond to user requests due to application deadlock, such a problem cannot be detected by process monitoring.

After the container starts, the initial state will be starting (starting), and Docker Engine will wait for the interval time to start executing the health check command, which is executed periodically. If a single check returns a non-zero value or takes longer to run than the specified timeout, the check is considered to have failed. If the health check fails more than the number of retries in a row, the status will change to unhealthy.

Once a health check is successful, Docker will place the container back into a healthy state
When the health status of the container changes, Docker Engine will issue a health_status event.

There are two ways to monitor the status of a container by checking it.

1. Dockerfile method

You can declare the health detection configuration of the application itself in the Dockerfile. The HEALTHCHECK directive declares the health detection command, which is used to determine whether the service status of the container master process is normal, thus giving a more realistic picture of the actual state of the container.

HEALTHCHECK command format.

HEALTHCHECK [option] CMD <command>: set the command to check the health of the container
HEALTHCHECK NONE: If the base image has a health check command, use this line to block it

In Dockerfile, HEALTHCHECK can only appear once, if you write more than one, only the last one will take effect.

Images built with a Dockerfile containing the HEALTHCHECK directive are equipped with a health check when the Docker container is instantiated. The health check is performed automatically after starting the container.

Details of the parameters can be found at: https://docs.docker.com/engine/reference/builder/#healthcheck

HEALTHCHECK supports the following options.

--interval=<interval>: the interval between two health checks, default is 30 seconds;
--timeout=<interval>: the timeout for the health check command to run, if it exceeds this time, the health check is considered to have failed, default is 30 seconds;
--retries=<count>: When the specified number of consecutive failures is reached, the container status will be considered as unhealthy, default 3 times.
--start-period=<interval>: The initialization time of the application’s startup, health check failures during the startup process are not counted, default 0 seconds;

The role of the parameters is explained as follows.

The running status check is first run within interval seconds after the container starts, and then again within interval seconds after the previous check completes.
If a status check takes more than timeout seconds, the check is considered to have failed.
A container is not considered unhealthy until it fails retries several times in a row.
start period provides initialization time for containers that need time to start. Probe failures during this period will not be counted towards the maximum number of retries.

However, if the health check succeeds during startup, the container is considered to have started and all consecutive failures are counted towards the maximum number of retries.

The command following HEALTHCHECK [option] CMD has the same format as ENTRYPOINT, shell format, and exec format.

The return value of the command determines whether the health check was successful or not.

0: success;
1: Fail;
2: Reserved value, do not use

Suppose an image is a simple web service and we want to add a health check to determine if the web service is working properly, we can use curl to help determine this, and its Dockerfile HEALTHCHECK can be written as follows.

1
2
3

FROM nginx:1.23
HEALTHCHECK --interval=5s --timeout=3s  --retries=3 \
    CMD curl -fs http://localhost/ || exit 1

Here the check is set every 5 seconds (the interval is very short for testing purposes, but should be relatively long in reality). If the health check command does not respond for more than 3 seconds, and if it does not respond after 3 retries, it is considered a failure, and curl -fs http://localhost/ || exit 1 is used as the health check command.

Use docker build to build this image.

`1`	`docker build -t myweb:v1 .`

Start the container after it is built.

`1`	`docker run -d --name web myweb:v1`

When the image is run, you can see the initial status as (health: starting) via docker container ls.

1
2
3

docker container ls
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                            PORTS               NAMES
7068d793c6e4        myweb:v1            "/docker-entrypoint.…"   3 seconds ago       Up 2 seconds (health: starting)   80/tcp              web

After waiting a few seconds, docker container ls again and you will see the health status change to (healthy).

1
2
3

$ docker container ls
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                    PORTS               NAMES
7068d793c6e4        myweb:v1            "/docker-entrypoint.…"   18 seconds ago      Up 16 seconds (healthy)   80/tcp               web

If the health check fails more than the number of retries in a row, the status will change to (unhealthy).

To help troubleshoot, the output of health check commands (including stdout and stderr) are stored in the health status, which can be viewed with docker inspect.

$ docker inspect --format '{{json .State.Health}}' web | python -m json.tool
{
    "FailingStreak": 0,
    "Log": [
        {
            "End": "2022-08-20T14:02:38.19224648+08:00",
            "ExitCode": 0,
            "Output": "<!DOCTYPE html>\n<html>\n<head>\n<title>Welcome to nginx!</title>\n<style>\nhtml { color-scheme: light dark; }\nbody { width: 35em; margin: 0 auto;\nfont-family: Tahoma, Verdana, Arial, sans-serif; }\n</style>\n</head>\n<body>\n<h1>Welcome to nginx!</h1>\n<p>If you see this page, the nginx web server is successfully installed and\nworking. Further configuration is required.</p>\n\n<p>For online documentation and support please refer to\n<a href=\"http://nginx.org/\">nginx.org</a>.<br/>\nCommercial support is available at\n<a href=\"http://nginx.com/\">nginx.com</a>.</p>\n\n<p><em>Thank you for using nginx.</em></p>\n</body>\n</html>\n",
            "Start": "2022-08-20T14:02:38.116041192+08:00"
        },
        {
            "End": "2022-08-20T14:02:43.271105619+08:00",
            "ExitCode": 0,
            "Output": "<!DOCTYPE html>\n<html>\n<head>\n<title>Welcome to nginx!</title>\n<style>\nhtml { color-scheme: light dark; }\nbody { width: 35em; margin: 0 auto;\nfont-family: Tahoma, Verdana, Arial, sans-serif; }\n</style>\n</head>\n<body>\n<h1>Welcome to nginx!</h1>\n<p>If you see this page, the nginx web server is successfully installed and\nworking. Further configuration is required.</p>\n\n<p>For online documentation and support please refer to\n<a href=\"http://nginx.org/\">nginx.org</a>.<br/>\nCommercial support is available at\n<a href=\"http://nginx.com/\">nginx.com</a>.</p>\n\n<p><em>Thank you for using nginx.</em></p>\n</body>\n</html>\n",
            "Start": "2022-08-20T14:02:43.200932585+08:00"
        }
    ],
    "Status": "healthy"
}

2. docker run method

Another way is to specify the healthcheck-related policy directly in the docker run command.

$ docker run  -d \
    --name=myweb \
    --health-cmd="curl -fs http://localhost/ || exit 1" \
    --health-interval=5s \
    --health-retries=12 \
    --health-timeout=2s \
    nginx:1.23

Check the relevant parameters and explanations by executing the -docker run --help | grep health command as follows.

--health-cmd string: run the command to check the health status
--health-interval duration: run interval time (ms|s|m|h) (default is 0s)
--health-retries int: the number of consecutive failures that need to be reported as unhealthy
--health-start-period duration : the starting period (ms|s|m|h) for the container to initialize before starting the health retry countdown (default 0)
--health-timeout duration: the maximum time (ms|s|m|h) allowed for a check to run (default 0s)
--no-healthcheck: disable any container-specified HEALTHCHECK, which will disable the HEALTHCHECK feature built out of the Dockerfile.

If you are managing multiple services of a container with a supervisor and want to determine the monitoring status of a container by the status of its subservices, you can use -supervisorctl status to do so, for example.

$ docker run --rm -d \
    --name=myweb \
    --health-cmd="supervisorctl status" \
    --health-interval=5s \
    --health-retries=3 \
    --health-timeout=2s \
    nginx:v1

With this parameter set, if supervisorctl status checks that the subservice has a RUNNING status that is not normal, then after waiting about 15 seconds, the monitoring status of the container will change from (healthy) to (unhealthy).

3. docker-composer method

In docker-composer, you can use the following way to implement health checking of containers (for example, containers with child processes managed through supervisor).

version: '3'
services:
  web:
    image: nginx:v1
    container_name: web
    healthcheck:
      test: ["CMD", "supervisorctl", "status"]
      interval: 5s
      timeout: 2s
      retries: 3

After successful execution, wait for a few seconds to query the status of the container.

$ docker-compose ps
Name              Command                  State                 Ports          
--------------------------------------------------------------------------------
web    supervisord -c /etc/superv ...   Up (healthy)   443/tcp, 80/tcp

When you stop some subservices inside by manually supervisorctl stop, the status of the subservices inside is not all in the RUNNING state, and then check the status of the container.

[root@hecs-399460 test]# docker-compose ps
Name              Command                   State                  Ports          
----------------------------------------------------------------------------------
web    supervisord -c /etc/superv ...   Up (unhealthy)   443/tcp, 80/tcp

4. Reference

https://www.seafog.cn/archives/751016741

Table of Contents

1. Dockerfile method

2. docker run method

3. docker-composer method

4. Reference