Since version 1.12, Docker has introduced a native health check implementation. The simplest health check for containers is the process-level health check, which verifies whether a process is alive or not; Docker Daemon automatically monitors the PID1 process in the container and can restart the ended container according to the restart policy if specified in the docker run command. In many practical scenarios, it is not enough to use the process-level health check mechanism. For example, if a container process is still running but cannot continue to respond to user requests due to application deadlock, such a problem cannot be detected by process monitoring.
After the container starts, the initial state will be starting (starting), and Docker Engine will wait for the interval time to start executing the health check command, which is executed periodically. If a single check returns a non-zero value or takes longer to run than the specified timeout, the check is considered to have failed. If the health check fails more than the number of retries in a row, the status will change to unhealthy.
- Once a health check is successful, Docker will place the container back into a healthy state
- When the health status of the container changes, Docker Engine will issue a health_status event.
There are two ways to monitor the status of a container by checking it.
1. Dockerfile method
You can declare the health detection configuration of the application itself in the Dockerfile. The
HEALTHCHECK directive declares the health detection command, which is used to determine whether the service status of the container master process is normal, thus giving a more realistic picture of the actual state of the container.
HEALTHCHECK command format.
HEALTHCHECK [option] CMD <command>: set the command to check the health of the container
HEALTHCHECK NONE: If the base image has a health check command, use this line to block it
HEALTHCHECKcan only appear once, if you write more than one, only the last one will take effect.
Images built with a Dockerfile containing the
HEALTHCHECK directive are equipped with a health check when the Docker container is instantiated. The health check is performed automatically after starting the container.
Details of the parameters can be found at: https://docs.docker.com/engine/reference/builder/#healthcheck
HEALTHCHECK supports the following options.
--interval=<interval>: the interval between two health checks, default is 30 seconds;
--timeout=<interval>: the timeout for the health check command to run, if it exceeds this time, the health check is considered to have failed, default is 30 seconds;
--retries=<count>: When the specified number of consecutive failures is reached, the container status will be considered as unhealthy, default 3 times.
--start-period=<interval>: The initialization time of the application’s startup, health check failures during the startup process are not counted, default 0 seconds;
The role of the parameters is explained as follows.
- The running status check is first run within interval seconds after the container starts, and then again within interval seconds after the previous check completes.
- If a status check takes more than timeout seconds, the check is considered to have failed.
- A container is not considered unhealthy until it fails retries several times in a row.
- start period provides initialization time for containers that need time to start. Probe failures during this period will not be counted towards the maximum number of retries.
However, if the health check succeeds during startup, the container is considered to have started and all consecutive failures are counted towards the maximum number of retries.
The command following
HEALTHCHECK [option] CMD has the same format as
ENTRYPOINT, shell format, and exec format.
The return value of the command determines whether the health check was successful or not.
- 0: success;
- 1: Fail;
- 2: Reserved value, do not use
Suppose an image is a simple web service and we want to add a health check to determine if the web service is working properly, we can use curl to help determine this, and its Dockerfile
HEALTHCHECK can be written as follows.
Here the check is set every 5 seconds (the interval is very short for testing purposes, but should be relatively long in reality). If the health check command does not respond for more than 3 seconds, and if it does not respond after 3 retries, it is considered a failure, and
curl -fs http://localhost/ || exit 1 is used as the health check command.
docker build to build this image.
Start the container after it is built.
When the image is run, you can see the initial status as
(health: starting) via
docker container ls.
After waiting a few seconds,
docker container ls again and you will see the health status change to
If the health check fails more than the number of retries in a row, the status will change to
To help troubleshoot, the output of health check commands (including
stderr) are stored in the health status, which can be viewed with
2. docker run method
Another way is to specify the healthcheck-related policy directly in the docker run command.
Check the relevant parameters and explanations by executing the
-docker run --help | grep health command as follows.
--health-cmd string: run the command to check the health status
--health-interval duration: run interval time (ms|s|m|h) (default is 0s)
--health-retries int: the number of consecutive failures that need to be reported as unhealthy
--health-start-period duration: the starting period (ms|s|m|h) for the container to initialize before starting the health retry countdown (default 0)
--health-timeout duration: the maximum time (ms|s|m|h) allowed for a check to run (default 0s)
--no-healthcheck: disable any container-specified
HEALTHCHECK, which will disable the
HEALTHCHECKfeature built out of the Dockerfile.
If you are managing multiple services of a container with a supervisor and want to determine the monitoring status of a container by the status of its subservices, you can use
-supervisorctl status to do so, for example.
With this parameter set, if
supervisorctl status checks that the subservice has a
RUNNING status that is not normal, then after waiting about 15 seconds, the monitoring status of the container will change from
3. docker-composer method
In docker-composer, you can use the following way to implement health checking of containers (for example, containers with child processes managed through supervisor).
After successful execution, wait for a few seconds to query the status of the container.
When you stop some subservices inside by manually
supervisorctl stop, the status of the subservices inside is not all in the
RUNNING state, and then check the status of the container.