Gracefully shutting down services with docker-compose

You should have encountered that if the service still has jobs to be processed and the service is to be updated, you need to wait until all jobs are processed before you can stop the service. When the service receives a shutdown notification signal, it should stop accepting jobs first, and then wait for the worker to finish processing the jobs before stopping the service, and then go online again. How to stop the service by docker-compose is the focus of this article. In this article, we will use Go Language as an example to teach how to accept signal from Docker, what to do after accepting the signal, and how to set up the YAML file of docker-compose to make sure all jobs can be executed properly.

Environment information

Currently running on Mac M1 system with the following docker version.

$ docker version
Client:
 Cloud integration: 1.0.17
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.16.4
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:23 2021
 OS/Arch:           darwin/arm64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:55:36 2021
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Next is the docker-compose version information.

$ docker-compose version
docker-compose version 1.29.2, build 5becea4c
docker-py version: 5.0.0
CPython version: 3.9.0
OpenSSL version: OpenSSL 1.1.1h  22 Sep 2020

Code examples

By preparing the actual application example first, we can run a service that is dedicated to performing some work that takes a long time. When we need to update the service, we need to stop the service from accepting any more work and wait for the work we were running to be done before we stop the service. Here is an example in Go language.

package main

import (
  "context"
  "log"
  "os"
  "os/signal"
  "sync"
  "syscall"
  "time"
)

func withContextFunc(ctx context.Context, f func()) context.Context {
  ctx, cancel := context.WithCancel(ctx)
  go func() {
    c := make(chan os.Signal)
    // register for interupt (Ctrl+C) and SIGTERM (docker)
    signal.Notify(c, syscall.SIGINT, syscall.SIGTERM, syscall.SIGKILL)
    defer signal.Stop(c)

    select {
    case <-ctx.Done():
    case <-c:
      f()
      cancel()
    }
  }()

  return ctx
}

func main() {
  jobChan := make(chan int, 100)
  stopped := make(chan struct{})
  finished := make(chan struct{})
  wg := &sync.WaitGroup{}
  ctx := withContextFunc(
    context.Background(),
    func() {
      log.Println("stop the server")
      close(stopped)
      wg.Wait()
      close(finished)
    },
  )

  // create 4 workers to process job
  for i := 0; i < 4; i++ {
    go func(i int) {
      log.Printf("start worker: %02d", i)
      for {
        select {
        case <-finished:
          log.Printf("stop worker: %02d", i)
          return
        default:
          select {
          case job := <-jobChan:
            time.Sleep(time.Duration(job*100) * time.Millisecond)
            log.Printf("worker: %02d, process job: %02d", i, job)
            wg.Done()
          default:
            log.Printf("worker: %02d, no job", i)
            time.Sleep(1 * time.Second)
          }
        }
      }
    }(i + 1)
  }

  // send job
  go func() {
    for i := 0; i < 50; i++ {
      wg.Add(1)
      select {
      case jobChan <- i:
        time.Sleep(100 * time.Millisecond)
        log.Printf("send the job: %02d\n", i)
      case <-stopped:
        wg.Done()
        log.Println("stoped send the job")
        return
      }
    }
    return
  }()

  select {
  case <-ctx.Done():
    time.Sleep(1 * time.Second)
    log.Println("server down")
  }
}

As you can see in the above example, four workers are created to receive job execution content, and the last Goroutine is used to generate jobs. In addition, two channels are given to stop the worker and stop the job generation. When the program is in progress, pressing ctrl + c directly will trigger stopped, which will stop sending jobs into jobChan, and finished will be closed when the four workers have finished executing the rest of the jobs. This will officially stop the four workers. Next, let’s see how to stop the service with the docker-compose command.

Using docker-compose command

To restart the service, you can first shut down the service with docker-compose stop. If the service is not processing the Signal, the service will be stopped directly. Then the running Job will be cut off, which is obviously not what you want. So when you write the program, you must handle the Signal signal, and when you run docker-compose stop, docker will send SIGTERM signal to the container (the root process PID in the container is 1), and the service can do the follow-up after receiving this signal. But you will find that after 10 seconds, docker will send another signal SIGKILL to force the service to shut down. To solve this problem, it’s easy to know how much time each job will take, and how much time it will take to run all four workers. At this point, you can add -t to determine how many seconds to send the SIGKILL signal.

$ docker-compose stop -h
Stop running containers without removing them.

They can be started again with `docker-compose start`.

Usage: stop [options] [--] [SERVICE...]

Options:
  -t, --timeout TIMEOUT      Specify a shutdown timeout in seconds.
                             (default: 10)

For example:

`1`	`docker-compose stop -t 600 app`

docker-compose settings

Since docker-compose stop is set to send SIGTERM signal first, if you want to replace it with another signal, you can add stop_signal to docker-compose.yml directly to determine the new signal, besides this, you can also set stop_grace_period to determine how long it takes for docker to send SIGKILL, the default is 10 seconds, which can be adjusted in the above way.

version: "3.9"

services:
  app:
    image: app:0.0.1
    build:
      context: .
      dockerfile: Dockerfile
    restart: always
    stop_signal: SIGINT
    stop_grace_period: 30s
    logging:
      options:
        max-size: "100k"
        max-file: "3"

For more details, please refer to stop_signal and stop_grace_period. After these two settings, you can also use docker-compose up -d to restart the container service normally.

docker signal handling

From the above, we know that every service needs to handle signals from docker, and what should we pay attention to when writing a dockerfile? Here is an example of Go language with Dockerfile:

FROM golang:1.16-alpine

COPY main.go /app/
COPY go.mod /app/

WORKDIR "/app"

CMD ["go", "run", "main.go"]

Then compile and execute.

1
2

docker-compose build
docker-compose up app

After the start, you will find that after the stop command, the container does not receive this signal at all, then go directly into the container to check, through the ps command.

/app # ps
PID   USER     TIME  COMMAND
    1 root      0:00 go run main.go
   68 root      0:03 /tmp/go-build4218998070/b001/exe/main
   78 root      0:00 /bin/sh
   84 root      0:00 ps

You will find that the SIGTERM signal is sent to PID 1, but the real process ID is 68, so the reason for not receiving the signal is here. The solution here is also very simple, that is, do not run through the go run method to run, but first build into a runtime file before using.

FROM golang:1.16-alpine

COPY main.go /app/
COPY go.mod /app/
RUN go build -o /app/main /app/main.go

WORKDIR "/app"

CMD ["/app/main"]

Also, if you use ["program", "arg1", "arg2"] in CMD or ENTRYPOINT, instead of program arg1 arg2, the latter will wrap another layer of bash in front of it for docker, but bash basically doesn’t handle Signal signals, which will also cause This will also prevent the service from shutting down properly.

If you want to handle signal signals from bash, you can refer to this article “Trapping signals in Docker containers”. Please see the docker-compose example on the website below.

simple:
  image: busybox:1.31.0-uclibc
  command:
    - sh
    - '-c'
    - |
        trap 'exit 0' SIGINT
        trap 'exit 1' SIGTERM
        while true; do :; done                
  stop_signal: SIGINT

Summary

In addition to Docker signal processing, you also need to use docker-compose up --scale to complete the service scaling. If the service needs to handle a lot of work and work for a long time, you need to update the service by this way, otherwise the work is suddenly interrupted, how to resume the work is another issue to be solved.

Table of Contents