This article focuses on how to upgrade a K8S cluster using kubeadm.

1. Overview

The upgrade of a K8S cluster can be divided into three main steps.

  1. upgrade a primary control-plane node
  2. upgrade the rest of the control plane nodes
  3. upgrade the remaining worker nodes

The cluster to be upgraded is a three-master-three-slave combination, using cilium and containerd. The K8S cluster version is 1.25.4 and is scheduled to be upgraded to 1.26.0.

1
2
3
4
5
6
7
8
$ kubectl get nodes -o wide
NAME                                       STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
k8s-cilium-master-10-31-80-1.tinychen.io   Ready    control-plane   15d   v1.25.4   10.31.80.1    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-2.tinychen.io   Ready    control-plane   15d   v1.25.4   10.31.80.2    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-3.tinychen.io   Ready    control-plane   15d   v1.25.4   10.31.80.3    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-4.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.4    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-5.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.5    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-6.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.6    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11

2. Preparation work

Before starting the upgrade we do some preparatory work.

  1. read the version update notes carefully: the main focus is on changes between the current version and the target upgrade version, especially for major upgrades.
  2. K8S clusters must use static control surface nodes
  3. etcd for K8S clusters must be static pod deployments or external etcd
  4. Back up important data and services: Although kubeadm upgrades only involve internal components of k8s, it is better to be prepared for important services and app-level stateful services.
  5. disable the cluster’s SWAP memory

Some notes.

  • For any kubelet minor version upgrade, be sure to evict all loads on top of that node (drain the node) first to avoid leaving some important workloads such as coredns to affect the stability of the whole cluster.
  • Since the spec hash value of a container will change after a cluster upgrade, all containers will be restarted after the cluster upgrade is complete.
  • You can use systemctl status kubelet or journalctl -xeu kubelet to view the kubelets logs to determine if its upgrade was successful.
  • It is not recommended to use -config to reconfigure the cluster while kubeadm upgrade is upgrading the cluster, if you need to update the configuration of the cluster, you can refer to this official tutorial.

3. Upgrade kubeadm

Before upgrading the cluster we need to upgrade the kubeadm on all the nodes on the cluster, here we use yum to upgrade to the corresponding version 1.26.0.

1
2
3
4
5
# View all available versions of kubeadm.
$ yum list --showduplicates kubeadm --disableexcludes=kubernetes

# Then upgrade the version of kubeadm to 1.26.0
$ yum install -y kubeadm-1.26.0-0 --disableexcludes=kubernetes

Once this is done we can check the version information of kubeadm and if the following message is output successfully the upgrade is successful.

1
2
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:57:06Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

4. Upgrading control surface nodes

4.1 Upgrading K8S components

First we will select one of the three control plane nodes to upgrade, here we will start with 10.31.80.1.

Next we look at the upgrade plan, which lists the components and APIs involved in the upgrade process, as well as details of the changes before and after the upgrade and whether we need to do it manually.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
$ kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.25.4
[upgrade/versions] kubeadm version: v1.26.0
[upgrade/versions] Target version: v1.26.0
[upgrade/versions] Latest version in the v1.25 series: v1.25.5

W1223 17:23:27.554231   15530 configset.go:177] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: strict decoding error: unknown field "udpIdleTimeout"
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     6 x v1.25.4   v1.25.5

Upgrade to the latest version in the v1.25 series:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.25.4   v1.25.5
kube-controller-manager   v1.25.4   v1.25.5
kube-scheduler            v1.25.4   v1.25.5
kube-proxy                v1.25.4   v1.25.5
CoreDNS                   v1.9.3    v1.9.3
etcd                      3.5.5-0   3.5.6-0

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.25.5

_____________________________________________________________________

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     6 x v1.25.4   v1.26.0

Upgrade to the latest stable version:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.25.4   v1.26.0
kube-controller-manager   v1.25.4   v1.26.0
kube-scheduler            v1.25.4   v1.26.0
kube-proxy                v1.25.4   v1.26.0
CoreDNS                   v1.9.3    v1.9.3
etcd                      3.5.5-0   3.5.6-0

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.26.0

_____________________________________________________________________


The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.

API GROUP                 CURRENT VERSION   PREFERRED VERSION   MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
kubelet.config.k8s.io     v1beta1           v1beta1             no
_____________________________________________________________________

As this upgrade does not span a large number of versions, there is not much to change and we can simply upgrade.

The kubeadm upgrade command also renews the cluster’s certificates during the upgrade process, so you can add the parameter --certificate-renewal=false if you don’t want to renew them.

For more information see the certificate management guide.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
$ kubeadm upgrade apply v1.26.0
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1223 17:30:04.493109   17731 configset.go:177] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: strict decoding error: unknown field "udpIdleTimeout"
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.26.0"
[upgrade/versions] Cluster version: v1.25.4
[upgrade/versions] kubeadm version: v1.26.0
[upgrade] Are you sure you want to proceed? [y/N]: y
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.26.0" (timeout: 5m0s)...
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Renewing etcd-server certificate
[upgrade/staticpods] Renewing etcd-peer certificate
[upgrade/staticpods] Renewing etcd-healthcheck-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2022-12-23-17-31-20/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[apiclient] Found 3 Pods for label selector component=etcd
[upgrade/staticpods] Component "etcd" upgraded successfully!
[upgrade/etcd] Waiting for etcd to become available
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests3224088652"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2022-12-23-17-31-20/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2022-12-23-17-31-20/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2022-12-23-17-31-20/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
W1223 17:33:49.448558   17731 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.26.0". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

Seeing a similar message output at the end means that the node has been upgraded successfully.

1
2
3
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.26.0". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

But don’t worry at this point, we have only upgraded one node at this point and need to continue to do the same for the remaining two nodes.

Note that the kubeadm upgrade plan command on the second node is not the same as before, because the cluster’s configmap was updated when the first control plane node was upgraded.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:57:06Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

$ kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.26.0
[upgrade/versions] kubeadm version: v1.26.0
[upgrade/versions] Target version: v1.26.0
[upgrade/versions] Latest version in the v1.26 series: v1.26.0

So here our update commands for the rest of the control surface nodes correspond to the following.

1
$ kubeadm upgrade node

When you see a similar output message, the upgrade has been successful.

1
2
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

4.2 Upgrading kubelet and kubectl

After the above update we have only upgraded the relevant pods in the K8S cluster, but not the kubelet, so the version information we see here is still 1.25.4.

1
2
3
4
5
6
7
8
$ kubectl get nodes -o wide
NAME                                       STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
k8s-cilium-master-10-31-80-1.tinychen.io   Ready    control-plane   15d   v1.25.4   10.31.80.1    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-2.tinychen.io   Ready    control-plane   15d   v1.25.4   10.31.80.2    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-3.tinychen.io   Ready    control-plane   15d   v1.25.4   10.31.80.3    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-4.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.4    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-5.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.5    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-6.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.6    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11

Before upgrading the kubelet, we need to evict the node and get rid of all the workload on it except for daemonset.

1
2
3
4
$ kubectl drain k8s-cilium-master-10-31-80-1.tinychen.io --ignore-daemonsets
node/k8s-cilium-master-10-31-80-1.tinychen.io cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/cilium-gj4vm, kube-system/kube-proxy-r7pj8, kube-system/kube-router-szdml
node/k8s-cilium-master-10-31-80-1.tinychen.io drained

You can then use yum to update kubelet and kubectl.

1
2
3
4
5
6
7
$ yum install -y kubelet-1.26.0-0 kubectl-1.26.0-0 --disableexcludes=kubernetes
$ systemctl daemon-reload
$ systemctl restart kubelet

# Check the logs to see if the relevant services are working.
$ systemctl status kubelet -l
$ journalctl -xeu kubelet

At this point you can check the status of the node and see that it has been successfully upgraded and the version information has been updated to 1.26.0.

1
2
3
4
5
6
7
8
$ kubectl get nodes -o wide
NAME                                       STATUS                     ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
k8s-cilium-master-10-31-80-1.tinychen.io   Ready,SchedulingDisabled   control-plane   15d   v1.26.0   10.31.80.1    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-2.tinychen.io   Ready                      control-plane   15d   v1.25.4   10.31.80.2    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-3.tinychen.io   Ready                      control-plane   15d   v1.25.4   10.31.80.3    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-4.tinychen.io   Ready                      <none>          15d   v1.25.4   10.31.80.4    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-5.tinychen.io   Ready                      <none>          15d   v1.25.4   10.31.80.5    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-6.tinychen.io   Ready                      <none>          15d   v1.25.4   10.31.80.6    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11

Once it has been determined that the node is normal, it can resume its scheduling.

1
$ kubectl uncordon k8s-cilium-master-10-31-80-1.tinychen.io

We then do the same for the remaining two nodes, and once they are both updated we can see that the entire control plane of the cluster has been upgraded.

1
2
3
4
5
6
7
8
$ kubectl get nodes -o wide
NAME                                       STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
k8s-cilium-master-10-31-80-1.tinychen.io   Ready    control-plane   15d   v1.26.0   10.31.80.1    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-2.tinychen.io   Ready    control-plane   15d   v1.26.0   10.31.80.2    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-3.tinychen.io   Ready    control-plane   15d   v1.26.0   10.31.80.3    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-4.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.4    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-5.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.5    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-6.tinychen.io   Ready    <none>          15d   v1.25.4   10.31.80.6    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11

5. Upgrading the worker node

The upgrade operation of the worker node is much simpler, as there are no control plane related components on it, so you only need to update the kubelet configuration.

1
2
3
4
5
6
7
8
9
$ kubeadm upgrade node
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

The next step is to repeat the above evict node --> upgrade kubelet --> check service --> restore node steps.

Usually kubectl is not installed on the worker node, so there is no need to upgrade kubectl.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Execute on the control plane node using kubectl.
$ kubectl drain k8s-cilium-worker-10-31-80-4.tinychen.io --ignore-daemonsets

# Executed on the worker node.
$ yum install -y kubelet-1.26.0-0 --disableexcludes=kubernetes
$ systemctl daemon-reload
$ systemctl restart kubelet

# Execute on the control plane node using kubectl.
$ kubectl uncordon k8s-cilium-worker-10-31-80-4.tinychen.io

Then repeat the above for all worker nodes to complete the upgrade of the entire K8S cluster.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Execute on the worker node
$ kubeadm upgrade node

# Execute on the control plane node using kubectl.
$ kubectl drain <换成node的名字> --ignore-daemonsets

# Execute on the worker node
$ yum install -y kubelet-1.26.0-0 --disableexcludes=kubernetes
$ systemctl daemon-reload
$ systemctl restart kubelet

# Execute on the control plane node using kubectl.
$ kubectl uncordon <换成node的名字>

Finally, checking the cluster status shows that all nodes have been upgraded to 1.26.0.

1
2
3
4
5
6
7
8
$ kubectl get nodes -o wide
NAME                                       STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
k8s-cilium-master-10-31-80-1.tinychen.io   Ready    control-plane   15d   v1.26.0   10.31.80.1    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-2.tinychen.io   Ready    control-plane   15d   v1.26.0   10.31.80.2    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-master-10-31-80-3.tinychen.io   Ready    control-plane   15d   v1.26.0   10.31.80.3    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-4.tinychen.io   Ready    <none>          15d   v1.26.0   10.31.80.4    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-5.tinychen.io   Ready    <none>          15d   v1.26.0   10.31.80.5    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11
k8s-cilium-worker-10-31-80-6.tinychen.io   Ready    <none>          15d   v1.26.0   10.31.80.6    <none>        CentOS Linux 7 (Core)   6.0.11-1.el7.elrepo.x86_64   containerd://1.6.11

6. Upgrade failure

The upgrade failure scenario we are considering here is a control plane node upgrade failure, because the worker node upgrade process is relatively simple, and if the workload on the worker node is expelled before the upgrade, the worst that can happen is that the node is removed from the cluster and then reinstalled and added to the cluster, which does not have a significant impact on the overall. However, if the control plane node fails during the upgrade process, it is very likely that the whole cluster will crash, so here are some official measures to deal with upgrade failures.

  • Automatic rollback of failed upgrades: This is the unfortunate part, if a cluster node fails to upgrade using kubeadm, theoretically it will automatically roll back to the old version before the upgrade, and if the rollback is successful, the cluster should be working fine at that point.
  • Unexpected interruption of the upgrade process: If the upgrade process is interrupted due to network conditions or other issues, simply execute the previous upgrade command again; since kubeadm is officially claimed to be idempotent, this means that you only need to check that the cluster status is normal after the upgrade is complete.
  • Manual rollback of failed upgrade: kubeadm will back up the manifests file and the etcd data to the /etc/kubernetes/tmp directory before the upgrade, in the worst case we will have to manually restore the data and restart the service.