Registry

Docker Distribution

Docker Distribution is the first tool that implements packaging, distribution, storage and image distribution, acting as a Docker registry. (Currently Distribution has been donated to CNCF). The spec specification in Docker Distribution has since become the OCI distribution-spec specification. It can be assumed that Docker Distribution implements most of the OCI image distribution specifications, and that the two are largely compatible. OCI’s guiding philosophy is to have industry practices first and then summarize those practices into technical specifications, so while OCI’s distribution-spec specification has not yet been officially released (the current version is v1.0.0-rc1), the Docker Distribution-based image repository has become a commonly adopted solution. The Docker registry HTTP API V2 has become the de facto standard.

Harbor

Harbor also uses Docker Distribution (docker registry) as the back-end image storage service. In versions prior to Harbor 2.0, most of the image related functions were handled by Docker Distribution, and metadata for images and OCI artifacts were extracted from the docker registry by the harbor component. After Harbor 2.0, metadata related to mirrors and OCI artifacts is maintained by Harbor itself, and metadata is written to harbor’s database when PUSHing these artifacts. Thanks to this, Harbor is no longer just a service for storing and managing mirrors, but a cloud-native repository service that can store and manage a wide range of OCI-compliant Artifacts such as Helm Chart, CNAB, OPA Bundle, etc.

docker registry to harbor

Well, after all these useless concepts, let’s get back to the problem we are trying to solve in this article: How to migrate images from docker registry to harbor?

Suppose there are two machines in the intranet environment, one machine is running docker registry with the domain name registry.k8s.li. The other machine is running harbor, assuming the domain name is harbor.k8s.li. docker registry now holds 5,000 images. harbor has just been deployed and there are no images in it yet. How can I efficiently migrate the images in the docker registry to harbor without disk and network limitations?

Get a list of all mirrors in the registry

First of all, before migration we have to pull the list and get a list of images in the docker registry, so that we can ensure that no images are lost after migration. In the registry storage directory, the tag of each mirror is pointed to by the current/index file, so we can get the tags of all mirrors by traversing the current/index file in the registry storage directory, and thus get the list of all mirrors in the registry. Note that we can only get the mirrors with tags, but not the other mirrors without tags.

A list of mirrors can be obtained from the registry storage directory with the following command.

1
2
3
# 首先进入到 registry 存储的主目录下
cd  /var/lib/registry
find docker -type d -name "current" | sed 's|docker/registry/v2/repositories/||g;s|/_manifests/tags/|:|g;s|/current||g' > images.list

harbor create project

For new harbor deployments, there will only be a project with a default library on it, so you need to manually create the corresponding project in the docker registry on the harbor. repositories` in the registry storage directory.

Once we have the list of mirrors and the corresponding project created on harbor, we are ready to do the official migration. Depending on the scenario, the following options can be used.

Option 1: docker retag

Option 1 is probably the first way most people think of, and it’s also the easiest and most brutal way. It is to use docker to pull all the images in the docker registry on one machine, then docker retag, and then docker push to the harbor.

1
2
3
4
5
6
7
# 假设其中的一个镜像为 library/alpine:latest

docker pull registry.k8s.li/library/alpine:latest

docker tag registry.k8s.li/library/alpine:latest harbor.k8s.li/library/alpine:latest

docker push harbor.k8s.li/library/alpine:latest

This solution is a bit silly, because the docker pull -> docker tag -> docker pull process decompresses the image’s layer. For just copying images from one registry to another, these dockers are doing a lot of useless work in these processes. We won’t go into the details here.

So, in order to pursue efficiency, we will not use docker retag such a stupid way, so we will talk about option 2.

Option 2: skopeo

You can use skopeo copy to copy image raw blobs directly from one registry to another registry without involving image layer decompression during the process. As for performance and time consumption, it is much better than using docker 😂.

  • Use skopeo copy
1
skopeo copy --insecure-policy --src-tls-verify=false --dest-tls-verify=false --src docker://registry.k8s.li/library/alpine:latest docker://harbor.k8s.li/library/alpine:latest
  • Using skopeo sync
1
skopeo sync --insecure-policy --src-tls-verify=false --dest-tls-verify=false --src docker --dest docker registry.k8s.li/library/alpine:latest harbor.k8s.li/library/alpine:latest

But is there a better way? You know that both docker and skopeo are essentially downloading and uploading images through the registry’s HTTP API, and there are still a lot of HTTP requests in the process. So is there a better way?

Option 3: Migrate the storage directory

As mentioned at the beginning of the article, harbor’s back-end image storage also uses the docker registry. For a registry, as long as it uses Docker Distribution V2, its back-end storage directory structure looks exactly the same. Then why not copy the registry storage directory and extract it to the harbor registry storage directory? This way you can make sure that all the images are migrated and no one is left behind.

For harbor 1.x, migrate the docker registry storage directly to harbor’s registry storage, delete harbor’s redis data (because harbor’s redis caches the image’s metadata information), restart harbor, and you’re done. After restarting harbor, harbor will call the back-end registry to extract the mirror’s metadata information and store it in redis. This completes the migration.

Back up the registry storage directory on the docker registry machine

1
2
3
4
5
# 切换到 docker registry 的存储目录
cd  /var/lib/registry

# 注意,进行备份时无需进行压缩,因为 registry 中镜像的 layer 都是压缩过的
tar -cpf docker.tar docker

After the backup is complete, scp the docker.tar to the harbor machine and restore the registry storage directory on the harbor machine

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 切换到 harbor 的存储目录
cd /data/harbor

# 将备份的 docker 目录解压到 harbor 的 registry 目录下,目录层级一定要对应好
tar -xpf docker.tar -C ./registry

# 删除 harbor 的 regis 数据,重启 harbor 后会重建 redis 数据。
rm -f redis/dump.rdb

# 切换到 harbor 的安装目录重启 harbor
cd /opt/harbor
docker-compose restart

After this migration, you may encounter the problem of not being able to push images to harbor. Because the registry storage directory in the docker registry container belongs to root and the registry storage directory in the harbor registry container belongs to 10000:10000, the permissions are not the same, so harbor cannot push the image. Therefore, you need to change the ownership and group of the harbor registry directory after the migration is completed.

1
2
3
4
5
# 切换到 harbor 的存储目录
cd /data/harbor

# 修改 registry 存储目录的所属和所属组为 10000
chown -R 10000:10000 ./registry

Option 4

For harbor 2.x, because harbor has enhanced the metadata management capabilities of Artifacts, that is, metadata is written to harbor’s own database when it is pushed or synced to harbor. In harbor’s view, as long as there is no manifest information for the Artifact or layer in the database, harbor will assume that the Artifact or layer does not exist and return a 404 error. The direct method of extracting the docker registry storage directory to harbor’s registry storage directory does not work according to option 3. Since the image is extracted to the registry storage, even though the image appears to be there in the harbor registry container, harbor will think there is no image because there is no image in the harbor database. So now it seems that we can only use skopeo to push the mirrors to harbor one by one by option 2.

But for some specific scenarios, you can’t have a docker registry HTTP service like in Scenario 2, but only a docker registry zip, so how do you migrate the mirrors from the docker registry storage directory to harbor 2.0?

The mirror formats supported by skopeo are as follows.

IMAGE NAMES example
containers-storage: containers-storage:
dir: dir:/PATH
docker:// docker://k8s.gcr.io/kube-apiserver:v1.17.5
docker-daemon: docker-daemon:alpine:latest
docker-archive: docker-archive:alpine.tar (docker save)
oci: oci:alpine:latest

For example, docker:// is a registry; docker-daemon: is a local docker pull; and docker- archive is the image saved by docker; and dir: is the image saved as a folder. The same image has these ways of existence, just like water has gas, liquid, and solid. You can understand it this way, they all represent the same image, but in different ways.

Since the image is stored in the registry storage directory, using the dir format to read the image directly from the filesystem is theoretically better than option 2. Although skopeo supports mirrors in dir format, skopeo does not currently support direct use of the registry storage directory, so you still need to find a way to convert each image in the docker registry storage directory into a skopeo dir format.

skopeo dir

So let’s take a look at what skopeo dir looks like.

To test the feasibility of the solution, first pull an image from the docker hub and save it as a dir using the skopeo command as follows.

1
skopeo copy docker://alpine:latest dir:./alpine

Use the tree command to look at the directory structure of the alpine folder, as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
╭─root@sg-02 /var/lib/registry
╰─# tree -h alpine
alpine
├── [2.7M]  4c0d98bf9879488e0407f897d9dd4bf758555a78e39675e72b5124ccf12c2580
├── [1.4K]  e50c909a8df2b7c8b92a6e8730e210ebe98e5082871e66edd8ef4d90838cbd25
├── [ 528]  manifest.json
└── [  33]  version

0 directories, 4 files
╭─root@sg-02 /var/lib/registry
╰─# file alpine/e50c909a8df2b7c8b92a6e8730e210ebe98e5082871e66edd8ef4d90838cbd25
alpine/e50c909a8df2b7c8b92a6e8730e210ebe98e5082871e66edd8ef4d90838cbd25: ASCII text, with very long lines, with no line terminators

╭─root@sg-02 /var/lib/registry
╰─# file alpine/4c0d98bf9879488e0407f897d9dd4bf758555a78e39675e72b5124ccf12c2580
alpine/4c0d98bf9879488e0407f897d9dd4bf758555a78e39675e72b5124ccf12c2580: gzip compressed data

From the file name and size as well as the introspection of the file, we can tell that the manifest file corresponds to the manifests file of the image; the file of type ASCII text is the image config file of the image, which contains the metadata information of the image. The other gzip compressed data file is the image layer that has been compressed by gzip. A look at the contents of the manifest file also reaffirms this conclusion.

  • The config field of the image corresponds to exactly e50c909a8df2, and the file type is exactly image.v1+json text file.
  • The layer field of the image corresponds to exactly 4c0d98bf9879 and the file type is exactly .tar.gzip gzip compressed file.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
alpine/4c0d98bf9879488e0407f897d9dd4bf758555a78e39675e72b5124ccf12c2580: gzip compressed data
╭─root@sg-02 /var/lib/registry
╰─# cat alpine/manifest.json
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 1471,
      "digest": "sha256:e50c909a8df2b7c8b92a6e8730e210ebe98e5082871e66edd8ef4d90838cbd25"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 2811321,
         "digest": "sha256:4c0d98bf9879488e0407f897d9dd4bf758555a78e39675e72b5124ccf12c2580"
      }
   ]
}

Retrieve the image from the registry storage directory

Now comes the better part of this article. How to get the image out of the registry storage and into the dir format supported by skopeo.

  • The first thing to do is to get the manifests file of the image, from which you can get all the blob files of the image. For example, for the library/alpine:latest image in the registry storage directory, it is stored in the registry like this.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
╭─root@sg-02 /var/lib/registry/docker/registry/v2
╰─# tree
.
├── blobs
│   └── sha256
│       ├── 21
│       │   └── 21c83c5242199776c232920ddb58cfa2a46b17e42ed831ca9001c8dbc532d22d
│       │       └── data
│       ├── a1
│       │   └── a143f3ba578f79e2c7b3022c488e6e12a35836cd4a6eb9e363d7f3a07d848590
│       │       └── data
│       └── be
│           └── be4e4bea2c2e15b403bb321562e78ea84b501fb41497472e91ecb41504e8a27c
│               └── data
└── repositories
    └── library
        └── alpine
            ├── _layers
            │   └── sha256
            │       ├── 21c83c5242199776c232920ddb58cfa2a46b17e42ed831ca9001c8dbc532d22d
            │       │   └── link
            │       └── be4e4bea2c2e15b403bb321562e78ea84b501fb41497472e91ecb41504e8a27c
            │           └── link
            ├── _manifests
            │   ├── revisions
            │   │   └── sha256
            │   │       └── a143f3ba578f79e2c7b3022c488e6e12a35836cd4a6eb9e363d7f3a07d848590
            │   │           └── link
            │   └── tags
            │       └── latest
            │           ├── current
            │           │   └── link
            │           └── index
            │               └── sha256
            │                   └── a143f3ba578f79e2c7b3022c488e6e12a35836cd4a6eb9e363d7f3a07d848590
            │                       └── link
            └── _uploads

26 directories, 8 files
  1. get the sha256 value of the manifests file of the alpine mirror lasts tag from the repositories/library/alpine/_manifests/tags/latest/current/link file, and then go to blobs to find the manifests file of the mirror;
1
2
3
╭─root@sg-02 /var/lib/registry/docker/registry/v2/repositories/library/alpine/_manifests/tags/latest/current/
╰─# cat link
sha256:39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01#
  1. Find the corresponding file in the blobs directory according to the sha256 value in the current/link file, the corresponding manifests file in the blobs directory is blobs/sha256/39/ 39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01/data;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
╭─root@sg-02 /var/lib/registry/docker/registry/v2/repositories/library/alpine/_manifests/tags/latest/current
╰─# cat /var/lib/registry/docker/registry/v2/blobs/sha256/39/39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01/data
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 1507,
      "digest": "sha256:f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 2813316,
         "digest": "sha256:cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08"
      }
   ]
}
  1. Using regular matching, all sha256 values in the manifests file are filtered out, and these sha256 values correspond to the image config file and the image layer file in the blobs directory;
1
2
3
4
╭─root@sg-02 /var/lib/registry/docker/registry/v2/repositories/library/alpine/_manifests/tags/latest/current
╰─# grep -Eo "\b[a-f0-9]{64}\b" /var/lib/registry/docker/registry/v2/blobs/sha256/39/39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01/data
f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a
cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08
  1. Based on the manifests file, you can get all the layer and image config files of the image in the blobs directory, and then put these files together into a dir format, where the image is copied from the registry storage directory using the cp method, as follows.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 首先创建一个文件夹,为了保留镜像的 name 和 tag,文件夹的名称就对应的是 NAME:TAG
╭─root@sg-02 /var/lib/registry/docker
╰─# mkdir -p skopeo/library/alpine:latest

# 复制镜像的 manifest 文件
╭─root@sg-02 /var/lib/registry/docker
╰─# cp /var/lib/registry/docker/registry/v2/blobs/sha256/39/39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01/data skopeo/library/alpine:latest/manifest

# 复制镜像的 blob 文件
# cp /var/lib/registry/docker/registry/v2/blobs/sha256/f7/f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a/data skopeo/library/alpine:latest/f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a
# cp /var/lib/registry/docker/registry/v2/blobs/sha256/cb/cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08/data skopeo/library/alpine:latest/cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08

The final image format obtained is as follows.

1
2
3
4
5
6
7
8
╭─root@sg-02 /var/lib/registry/docker
╰─# tree skopeo/library/alpine:latest
skopeo/library/alpine:latest
├── cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08
├── f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a
└── manifest

0 directories, 3 files

Compare with the dir folder copied from skopeo above, everything is exactly the same except for an insignificant version file.

  1. To optimize this, change the cp operation in step 4 to a hard link operation, which will greatly reduce the IO operations on the disk. Note that hard-linked files cannot span partitions, so they must be in the same partition as the registry storage directory.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
╭─root@sg-02 /var/lib/registry/docker
╰─# ln /var/lib/registry/docker/registry/v2/blobs/sha256/39/39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01/data skopeo/library/alpine:latest/manifest
# ln /var/lib/registry/docker/registry/v2/blobs/sha256/f7/f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a/data skopeo/library/alpine:latest/f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a
# ln /var/lib/registry/docker/registry/v2/blobs/sha256/cb/cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08/data skopeo/library/alpine:latest/cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08
╭─root@sg-02 /var/lib/registry/docker
╰─# tree skopeo/library/alpine:latest
skopeo/library/alpine:latest
├── cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08
├── f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a
└── manifest

0 directories, 3 files

Then use skopeo copy or skopeo sync to push the retrieved image to harbor

  • Use skopeo copy
1
2
skopeo copy  --insecure-policy --src-tls-verify=false --dest-tls-verify=false \
dir:skopeo/library/alpine:latest docker://harbor.k8s.li/library/alpine:latest
  • Using skopeo sync

Note that the skopeo sync method synchronizes the project level, and the name and tag of the image correspond to the name of the directory

1
2
skopeo sync --insecure-policy --src-tls-verify=false --dest-tls-verify=false \
--src dir --dest docker skopeo/library/ harbor.k8s.li/library/

Shell Script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
REGISTRY_DOMAIN="harbor.k8s.li"
REGISTRY_PATH="/var/lib/registry"

# 切换到 registry 存储主目录下
cd ${REGISTRY_PATH}

gen_skopeo_dir() {
   # 定义 registry 存储的 blob 目录 和 repositories 目录,方便后面使用
    BLOB_DIR="docker/registry/v2/blobs/sha256"
    REPO_DIR="docker/registry/v2/repositories"
    # 定义生成 skopeo 目录
    SKOPEO_DIR="docker/skopeo"
    # 通过 find 出 current 文件夹可以得到所有带 tag 的镜像,因为一个 tag 对应一个 current 目录
    for image in $(find ${REPO_DIR} -type d -name "current"); do
        # 根据镜像的 tag 提取镜像的名字
        name=$(echo ${image} | awk -F '/' '{print $5"/"$6":"$9}')
        link=$(cat ${image}/link | sed 's/sha256://')
        mfs="${BLOB_DIR}/${link:0:2}/${link}/data"
        # 创建镜像的硬链接需要的目录
        mkdir -p "${SKOPEO_DIR}/${name}"
        # 硬链接镜像的 manifests 文件到目录的 manifest 文件
        ln ${mfs} ${SKOPEO_DIR}/${name}/manifest.json
        # 使用正则匹配出所有的 sha256 值,然后排序去重
        layers=$(grep -Eo "\b[a-f0-9]{64}\b" ${mfs} | sort -n | uniq)
        for layer in ${layers}; do
          # 硬链接 registry 存储目录里的镜像 layer 和 images config 到镜像的 dir 目录
            ln ${BLOB_DIR}/${layer:0:2}/${layer}/data ${SKOPEO_DIR}/${name}/${layer}
        done
    done
}

sync_image() {
    # 使用 skopeo sync 将 dir 格式的镜像同步到 harbor
    for project in $(ls ${SKOPEO_DIR}); do
        skopeo sync --insecure-policy --src-tls-verify=false --dest-tls-verify=false \
        --src dir --dest docker ${SKOPEO_DIR}/${project} ${REGISTRY_DOMAIN}/${project}
    done
}

gen_skopeo_dir
sync_image

In fact, it is possible to seamlessly support registry storage directories with some magic changes to skopeo’s source code, which is currently under study 😃.

Contrast

Option Scope of application Disadvantages
1 docker retag Synchronizing mirrors between two registries
2 skopeo Synchronizing mirrors between two registries
3 Decompression directory registry stores the directory to another registry harbor 1.x
4 skopeo dir registry stores the directory to another registry Applicable to harbor 2.x

Compare and summarize the above options.

  • Scheme 1: low start-up cost, applicable to the case where the number of mirrors is relatively small and there is no need to install skopeo, with the disadvantage of poor performance.
  • Option 1: For synchronous copy of mirrors between two registries, such as copying some public mirrors in docker hub to the company’s intranet mirror repository.
  • Option 3: It is suitable for migration between mirror repositories, and the performance is the best among all the options, but it should be noted that if the destination mirror repository is harbor 2.x, it is not possible to use this method.
  • Option 4: is a compromise version of Option 3, in order to adapt to harbor 2.0, because you need to push the mirror to harbor again, so the performance is worse than Option 3.