I am currently responsible for the packaging and release of the company’s PaaS toB product. One of my daily tasks is to run the automated packaging pipeline, and then update the installation packages to the QA test environment. Since the packaging and testing environments are located in two different server rooms, the product installation packages need to be synchronized across the public network from the packaging machine to the QA environment, so the size of the product installation packages determines the time consuming synchronization between the two. Optimizing and reducing the size of product installers is one of the ways to improve the efficiency of the pipeline. One of the recent efforts was to reduce the size of product patch packages by 30% to 60%, which greatly saved the time spent on uploading, downloading and installing patches, and improved the efficiency of the product packaging pipeline. So today we will summarize what we have learned from this experience.

Optimize again

Because all the components of the product are deployed in containerized form, the most important thing in the patch package of the product is the image file and some deployment scripts. As we all know, the docker image is composed of layers + metadata information of the image, where the metadata information of the image is the image config + manifests, which are text contents in json format, and these text contents are often negligible compared to the size of the layer of the image.

In fact, an optimization was done last year, replacing the way the patch package image is packaged from the original docker save to the skopeo copy to directory. Although the first time has been so obvious optimization, we still think there is room for optimization.

After the first optimization, the image in the product patch package exists in the following form.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
root@debian:/root/kube # tree images -h
images
├── [4.0K]  kube-apiserver:v1.20.5
│   ├── [707K]  742efefc8a44179dcc376b969cb5e3f8afff66f87ab618a15164638ad07bf722
│   ├── [ 28M]  98d681774b176bb2fd6b3499377d63ff4b1b040886dd9d3641bb93840815a1e7
│   ├── [2.6K]  d7e24aeb3b10210bf6a2dc39f77c1ea835b22af06dfd2933c06e0421ed6d35ac
│   ├── [642K]  fefd475334af8255ba693de12951b5176a2853c2f0d5d2b053e188a1f3b611d9
│   ├── [ 949]  manifest.json
│   └── [  33]  version
├── [4.0K]  kube-controller-manager:v1.20.5
│   ├── [ 27M]  454a7944c47b608efb657a1bef7f4093f63ceb2db14fd78c5ecd2a08333da7cf
│   ├── [2.6K]  6f0c3da8c99e99bbe82920a35653f286bd8130f0662884e77fa9fcdca079c07f
│   ├── [707K]  742efefc8a44179dcc376b969cb5e3f8afff66f87ab618a15164638ad07bf722
│   ├── [642K]  fefd475334af8255ba693de12951b5176a2853c2f0d5d2b053e188a1f3b611d9
│   ├── [ 949]  manifest.json
│   └── [  33]  version
└── [4.0K]  kube-scheduler:v1.20.5
    ├── [ 12M]  565677e452d17c4e2841250bbf0cc010d906fbf7877569bb2d69bfb4e68db1b5
    ├── [707K]  742efefc8a44179dcc376b969cb5e3f8afff66f87ab618a15164638ad07bf722
    ├── [2.6K]  8d13f1db8bfb498afb0caff6bf3f8c599ecc2ace74275f69886067f6af8ffdf6
    ├── [642K]  fefd475334af8255ba693de12951b5176a2853c2f0d5d2b053e188a1f3b611d9
    ├── [ 949]  manifest.json
    └── [  33]  version

For example, kube-apiserver, kube-controller-manager, and kube-scheduler are all base images that use k8s.gcr.io/build-image/go-runner. controller-managerandkube-schedulerall use thek8s.gcr.io/build-image/go-runnerbase image. In the registry, it only needs to store one copy of thego-runner` base image. If you use skopeo copy to store it in a directory, you need to store a separate copy of the base image.

From the file names and file sizes, we can roughly deduce that the 707K size 742efefc8a is the root filesystem of the go-runner image; the 642K size fefd47533 is the go-runner binary; the 2.x size should be the image config file of the image; the remaining The remaining ones are the binary files of kube-apiserver, kube-controller-manager, kube-scheduler, and the manifest.json file is the manifest of the image in the registry storage.

  • Using find to count the number of these files, the total number of layer files and config files in the mirror is reduced from 12 to 8 after de-duplication. A simple addition calculation is: 3 image config files + 3 binary files + 1 base image layer file + 1 go-runner binary file, which is exactly 8 😂
1
2
3
4
root@debian:/root/kube # find images -type f | grep -Eo "\b[a-f0-9]{64}\b" | wc
12
root@debian:/root/kube # find images -type f | grep -Eo "\b[a-f0-9]{64}\b" | sort -u | wc -l
8

Since the image files in the patch package have some of the same layers, wouldn’t it be possible to reduce the size of the patch package by removing these same layer files? So I took a historical patch package and tested it out.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
root@debian:/root $ du -sh images
3.3G	images
root@debian:/root $ find images -type f ! -name 'version' ! -name 'manifest.json' | wc -l
279
root@debian:/root $ mkdir -p images2
root@debian:/root $ find images -type f -exec mv {} images2 \;
root@debian:/root $ du -sh images2
1.3G	images2
root@debian:/root $ $ find images2 -type f ! -name 'version' ! -name 'manifest.json' | wc -l
187

After testing, I found that the total number of image files in the patch package was reduced from 279 to 187, and the total size was reduced from 3.3G to 1.3G, a 60% reduction! I was so excited at that time that I was like a treasure. In fact, this is because the base images used by our product components are basically the same, so many of the same base image layer files can be eliminated.

Now that we have found a way to reduce the size of the mirrors in the patch package, we just need to find a way to de-duplicate these mirror layers. The first idea is to use registry storage: according to the nature of registry storage, mirrors can reuse the same layer in the registry. So the general idea is to convert the mirrors in these patch packages to the registry storage format, and then convert the registry storage format to the dir format supported by skopeo copy during installation.

Build skopeo dir mirror storage

  • In order to facilitate the demonstration, we need to find a suitable mirror list, after looking at the ks-installer project there is a mirror list, it looks more suitable then use it 😃
1
root@debian:/root # curl -L -O https://github.com/kubesphere/ks-installer/releases/download/v3.0.0/images-list.txt
  • First synchronize the image to the local directory using skopeo sync and count the size of the image and the number of files
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
root@debian:/root # for img in $(cat cat images-list.txt | grep -v "#");do skopeo sync --insecure-policy --src docker --dest dir ${img} images; done

root@debian:/root # tree images -d -L 1
images
├── alpine:3.10.4
├── busybox:1.31.1
├── calico
├── coredns
├── csiplugin
├── docker:19.03
├── elastic
├── fluent
├── haproxy:2.0.4
├── istio
├── jaegertracing
├── java:openjdk-8-jre-alpine
├── jenkins
├── jimmidyson
├── joosthofman
├── kubesphere
├── minio
├── mirrorgooglecontainers
├── mysql:8.0.11
├── nginx:1.14-alpine
├── nginxdemos
├── openpitrix
├── osixia
├── perl:latest
├── prom
├── redis:5.0.5-alpine
└── wordpress:4.8-apache
  • After using skopeo sync to sync the images to the local images directory, the size of all the images is 11G and the total number of files is 1264.
1
2
3
4
root@debian:/root # du -sh images
11G	images
root@debian:/root # find images -type f ! -name "version" | wc -l
1264

Convert to registry storage directory

According to the registry storage structure shown below, we have to store the image layer, image config, and manifests in the blobs/sha256 directory according to their sha256 values, and then create the corresponding link files in the repositories directory, so that the image can be converted to the registry storage format.

For demonstration purposes, let’s take a single image and convert the image images/alpine:3.10.4 into a docker registry storage directory

1
2
3
4
5
6
7
root@debian:/root # tree -h images/alpine:3.10.4
images/alpine:3.10.4
└── [4.0K]  alpine:3.10.4
    ├── [2.7M]  4167d3e149762ea326c26fc2fd4e36fdeb7d4e639408ad30f37b8f25ac285a98
    ├── [1.5K]  af341ccd2df8b0e2d67cf8dd32e087bfda4e5756ebd1c76bbf3efa0dc246590e
    ├── [ 528]  manifest.json
    └── [  33]  version

According to the image file size we can know that: 2.7M size 4167d3e1497...... file is the layer file of the image, since alpine is a base image, the layer is the root file system of alpine; 1.5K size af341ccd2...... is obviously the images config file of the image; the manifest.json file is the manifest.json file of the image in the registry storage.

  • First create the directory structure of the image in registry storage
1
2
3
4
5
6
7
8
9
root@debian:/root # mkdir -p docker/registry/v2/{blobs/sha256,repositories/alpine}
root@debian:/root # tree docker
docker
└── registry
    └── v2
        ├── blobs
        │   └── sha256
        └── repositories
            └── alpine
  • Build the link file for the mirror layer
1
2
3
grep -Eo "\b[a-f0-9]{64}\b" images/alpine:3.10.4/manifest.json | sort -u | xargs -L1 -I {} mkdir -p docker/registry/v2/repositories/alpine/_layers/sha256/{}

grep -Eo "\b[a-f0-9]{64}\b" images/alpine:3.10.4/manifest.json | sort -u | xargs -L1 -I {} sh -c "echo -n 'sha256:{}' > docker/registry/v2/repositories/alpine/_layers/sha256/{}/link"
  • Build the link file for the mirror tag
1
2
3
4
5
6
7
8
9
manifests_sha256=$(sha256sum images/alpine:3.10.4/manifest.json | awk '{print $1}')
mkdir -p docker/registry/v2/repositories/alpine/_manifests/revisions/sha256/${manifests_sha256}
echo -n "sha256:${manifests_sha256}" > docker/registry/v2/repositories/alpine/_manifests/revisions/sha256/${manifests_sha256}/link

mkdir -p docker/registry/v2/repositories/alpine/_manifests/tags/3.10.4/index/sha256/${manifests_sha256}
echo -n "sha256:${manifests_sha256}" > docker/registry/v2/repositories/alpine/_manifests/tags/3.10.4/index/sha256/${manifests_sha256}/link

mkdir -p docker/registry/v2/repositories/alpine/_manifests/tags/3.10.4/current
echo -n "sha256:${manifests_sha256}" > docker/registry/v2/repositories/alpine/_manifests/tags/3.10.4/current/link
  • Build the blobs directory of the image
1
2
3
4
5
6
7
mkdir -p docker/registry/v2/blobs/sha256/${manifests_sha256:0:2}/${manifests_sha256}
ln -f images/alpine:3.10.4/manifest.json docker/registry/v2/blobs/sha256/${manifests_sha256:0:2}/${manifests_sha256}/data

for layer in $(grep -Eo "\b[a-f0-9]{64}\b" images/alpine:3.10.4/manifest.json); do
    mkdir -p docker/registry/v2/blobs/sha256/${layer:0:2}/${layer}
    ln -f  images/alpine:3.10.4/${layer} docker/registry/v2/blobs/sha256/${layer:0:2}/${layer}/data
done
  • The resulting registry storage directory is as follows
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
docker
└── registry
    └── v2
        ├── blobs
        │   └── sha256
        │       ├── 41
        │       │   └── 4167d3e149762ea326c26fc2fd4e36fdeb7d4e639408ad30f37b8f25ac285a98
        │       │       └── data
        │       ├── af
        │       │   └── af341ccd2df8b0e2d67cf8dd32e087bfda4e5756ebd1c76bbf3efa0dc246590e
        │       │       └── data
        │       └── de
        │           └── de78803598bc4c940fc4591d412bffe488205d5d953f94751c6308deeaaa7eb8
        │               └── data
        └── repositories
            └── alpine
                ├── _layers
                │   └── sha256
                │       ├── 4167d3e149762ea326c26fc2fd4e36fdeb7d4e639408ad30f37b8f25ac285a98
                │       │   └── link
                │       └── af341ccd2df8b0e2d67cf8dd32e087bfda4e5756ebd1c76bbf3efa0dc246590e
                │           └── link
                └── _manifests
                    ├── revisions
                    │   └── sha256
                    │       └── de78803598bc4c940fc4591d412bffe488205d5d953f94751c6308deeaaa7eb8
                    │           └── link
                    └── tags
                        └── 3.10.4
                            ├── current
                            │   └── link
                            └── index
                                └── sha256
                                    └── de78803598bc4c940fc4591d412bffe488205d5d953f94751c6308deeaaa7eb8
                                        └── link
  • To test if it works, locally docker run a registry container, mount the registry storage directory you just converted to /var/lib/registry of the container, then pull the image using docker pull and test if it works using docker run. If you can verify that it works, then the conversion is fine 😊.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
root@debian:/root # docker pull localhost/alpine:3.10.4
3.10.4: Pulling from alpine
4167d3e14976: Pull complete
Digest: sha256:de78803598bc4c940fc4591d412bffe488205d5d953f94751c6308deeaaa7eb8
Status: Downloaded newer image for localhost/alpine:3.10.4
root@debian:/root # docker run --rm -it localhost/alpine:3.10.4 cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.10.4
PRETTY_NAME="Alpine Linux v3.10"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
  • Combine the above steps into a shell script that reads
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/bin/bash
set -eo pipefail

IMAGES_DIR="images"
REGISTRY_DIR="docker"

rm -rf ${REGISTRY_DIR}
BLOBS_PATH="${REGISTRY_DIR}/registry/v2/blobs"
REPO_PATH="${REGISTRY_DIR}/registry/v2/repositories"

for image in $(find ${IMAGES_DIR} -type f | sed -n 's|/manifest.json||p' | sort -u); do
    image_name=$(echo ${image%%:*} | sed "s|${IMAGES_DIR}/||g")
    image_tag=${image##*:}; mfs="${image}/manifest.json"
    mfs_sha256=$(sha256sum ${image}/manifest.json | awk '{print $1}')
    mkdir -p ${BLOBS_PATH}/sha256/${mfs_sha256:0:2}/${mfs_sha256}
    ln -f ${mfs} ${BLOBS_PATH}/sha256/${mfs_sha256:0:2}/${mfs_sha256}/data

    # make image repositories dir
    mkdir -p ${REPO_PATH}/${image_name}/{_layers,_manifests/revisions}/sha256
    mkdir -p ${REPO_PATH}/${image_name}/_manifests/revisions/sha256/${mfs_sha256}
    mkdir -p ${REPO_PATH}/${image_name}/_manifests/tags/${image_tag}/{current,index/sha256}
    mkdir -p ${REPO_PATH}/${image_name}/_manifests/tags/${image_tag}/index/sha256/${mfs_sha256}

    # create image tag manifest link file
    echo -n "sha256:${mfs_sha256}" > ${REPO_PATH}/${image_name}/_manifests/tags/${image_tag}/current/link
    echo -n "sha256:${mfs_sha256}" > ${REPO_PATH}/${image_name}/_manifests/revisions/sha256/${mfs_sha256}/link
    echo -n "sha256:${mfs_sha256}" > ${REPO_PATH}/${image_name}/_manifests/tags/${image_tag}/index/sha256/${mfs_sha256}/link

    # link image layers file to registry blobs file
    for layer in $(grep -Eo "\b[a-f0-9]{64}\b" ${mfs}); do
        mkdir -p ${BLOBS_PATH}/sha256/${layer:0:2}/${layer}
        mkdir -p ${REPO_PATH}/${image_name}/_layers/sha256/${layer}
        echo -n "sha256:${layer}" > ${REPO_PATH}/${image_name}/_layers/sha256/${layer}/link
        ln -f ${image}/${layer} ${BLOBS_PATH}/sha256/${layer:0:2}/${layer}/data
    done
done
  • Using this script to convert all images in images, the final registry storage size is 8.3 G, which is 2.7 G less than before.
1
2
3
4
root@debian:/root # du -sh docker
8.3G	docker
root@debian:/root # find docker -type f -name "data" | wc -l
1046

Revert back to Dir format

After the above steps, the total size of the image files in the patch package is indeed reduced, but another problem is introduced: skopeo cannot use the registry storage format directly. So we need to do another conversion to restore the image from the registry storage format back to the dir format supported by skopeo.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
REGISTRY_DOMAIN="harbor.k8s.li"
REGISTRY_PATH="/var/lib/registry"

# 切换到 registry 存储主目录下
cd ${REGISTRY_PATH}

gen_skopeo_dir() {
   # 定义 registry 存储的 blob 目录 和 repositories 目录,方便后面使用
    BLOB_DIR="docker/registry/v2/blobs/sha256"
    REPO_DIR="docker/registry/v2/repositories"
    # 定义生成 skopeo 目录
    SKOPEO_DIR="docker/skopeo"
    # 通过 find 出 current 文件夹可以得到所有带 tag 的镜像,因为一个 tag 对应一个 current 目录
    for image in $(find ${REPO_DIR} -type d -name "current"); do
        # 根据镜像的 tag 提取镜像的名字
        name=$(echo ${image} | awk -F '/' '{print $5"/"$6":"$9}')
        link=$(cat ${image}/link | sed 's/sha256://')
        mfs="${BLOB_DIR}/${link:0:2}/${link}/data"
        # 创建镜像的硬链接需要的目录
        mkdir -p "${SKOPEO_DIR}/${name}"
        # 硬链接镜像的 manifests 文件到目录的 manifest 文件
        ln ${mfs} ${SKOPEO_DIR}/${name}/manifest.json
        # 使用正则匹配出所有的 sha256 值,然后排序去重
        layers=$(grep -Eo "\b[a-f0-9]{64}\b" ${mfs} | sort -u)
        for layer in ${layers}; do
          # 硬链接 registry 存储目录里的镜像 layer 和 images config 到镜像的 dir 目录
            ln ${BLOB_DIR}/${layer:0:2}/${layer}/data ${SKOPEO_DIR}/${name}/${layer}
        done
    done
}

sync_image() {
    # 使用 skopeo sync 将 dir 格式的镜像同步到 harbor
    for project in $(ls ${SKOPEO_DIR}); do
        skopeo sync --insecure-policy --src-tls-verify=false --dest-tls-verify=false \
        --src dir --dest docker ${SKOPEO_DIR}/${project} ${REGISTRY_DOMAIN}/${project}
    done
}

gen_skopeo_dir
sync_image