How to fix a Kubernetes cluster after changing IPs

Document a fix for a cluster failure caused by an IP change. There are two clusters, one is a single node (allinone) and the other is a four node (3 master 1 node) cluster.

1. Update Etcd certificate

Backup Etcd certificate at each Etcd node.

`1`	`cp -R /etc/ssl/etcd/ssl /etc/ssl/etcd/ssl-bak`

View the domain in the Etcd certificate

1
2
3

openssl x509 -in /etc/ssl/etcd/ssl/node-node1.pem -noout -text|grep DNS

                DNS:etcd, DNS:etcd.kube-system, DNS:etcd.kube-system.svc, DNS:etcd.kube-system.svc.cluster.local, DNS:localhost, DNS:node1, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1, IP Address:x.x.x.1

All DNS and IP values need to be recorded and used to generate new certificates.

Clean up old Etcd certificates at each Etcd node
1

rm -f /etc/ssl/etcd/ssl/*

Generate Etcd certificate configuration in an Etcd node.

vim /etc/ssl/etcd/ssl/openssl.conf

[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name

[req_distinguished_name]

[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names

[ ssl_client ]
extendedKeyUsage = clientAuth, serverAuth
basicConstraints = CA:FALSE
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
subjectAltName = @alt_names

[ v3_ca ]
basicConstraints = CA:TRUE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
authorityKeyIdentifier=keyid:always,issuer

[alt_names]
DNS.1 = localhost
DNS.2 = etcd.kube-system.svc.cluster.local
DNS.3 = etcd.kube-system.svc
DNS.4 = etcd.kube-system
DNS.5 = etcd
DNS.6 = xxx
IP.1 = 127.0.0.1
IP.2 = x.x.x.x

The hostname and IP address of all deployed Etcd nodes need to be included.

Generate Etcd’s CA certificate at an Etcd node

1
2
3

cd /etc/ssl/etcd/ssl
openssl genrsa -out ca-key.pem 2048
openssl req -x509 -new -nodes -key ca-key.pem -days 3650 -out ca.pem -subj "/CN=etcd-ca"

Generate an Etcd Admin certificate for each node in an Etcd node.

Generate a certificate for each node by setting different environment variables with export host=node1. Here node1 is the host name, keep it the same as before to avoid not finding the certificate due to name change.

1
2
3

openssl genrsa -out admin-${host}-key.pem 2048
openssl req -new -key admin-${host}-key.pem -out admin-${host}.csr -subj "/CN=etcd-admin-${host}"
openssl x509 -req -in admin-${host}.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out admin-${host}.pem -days 3650 -extensions ssl_client  -extfile openssl.conf

Generate Etcd Member certificates for each node in an Etcd node.

Switch nodes by export host=node1 and generate certificates for each node.

1
2
3

openssl genrsa -out member-${host}-key.pem 2048
openssl req -new -key member-${host}-key.pem -out member-${host}.csr -subj "/CN=etcd-member-${host}" -config openssl.conf
openssl x509 -req -in member-${host}.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out member-${host}.pem -days 3650 -extensions ssl_client -extfile openssl.conf

Certificates generated at an Etcd node distribution

The certificate under /etc/ssl/etcd/ssl/ needs to be distributed to each Etcd node.
View etcd configuration in an Etcd node

Here Etcd is started as a binary, and the location of the etcd configuration file can be found in systemd.
1 2 3 4

cat /etc/systemd/system/etcd.service ... EnvironmentFile=/etc/etcd.env

Replacement IP per Etcd node

Since there are multiple Etcd nodes, it is necessary to replace multiple sets of IPs, here is an example of three nodes.

export oldip1=x.x.x.1 
export newip1=x.x.10.1 

export oldip2=x.x.x.2
export newip2=x.x.10.2 

export oldip3=x.x.x.3 
export newip3=x.x.10.3

1
2
3

sed -i "s/$oldip1/$newip1/" /etc/etcd.env
sed -i "s/$oldip2/$newip2/" /etc/etcd.env
sed -i "s/$oldip3/$newip3/" /etc/etcd.env

/etc/hosts also needs to replace the IP, as sometimes the hostname is used in the configuration file.

1
2
3

sed -i "s/$oldip1/$newip1/" /etc/hosts
sed -i "s/$oldip2/$newip2/" /etc/hosts
sed -i "s/$oldip3/$newip3/" /etc/hosts

If you have a regular backup task, you will also need to replace the relevant IP.

1
2
3

sed -i "s/$oldip1/$newip1/" /usr/local/bin/kube-scripts/etcd-backup.sh
sed -i "s/$oldip2/$newip2/" /usr/local/bin/kube-scripts/etcd-backup.sh
sed -i "s/$oldip3/$newip3/" /usr/local/bin/kube-scripts/etcd-backup.sh

Each Etcd node restores Etcd data from a backup

This step can be skipped if Etcd is a single node. The Etcd cluster is no longer operational because the node IPs have changed. Multi-node Etcd needs to use backup data to recover, because Etcd’s node information is stored in the disk data and it is not useful to just modify the configuration file.

Distribute the Etcd backup file snapshot.db to each Etcd node.

Execute the following command on each node:

`1`	`rm -rf /var/lib/etcd`

etcdctl snapshot restore snapshot.db --name etcd-node1 \
        --initial-cluster "etcd-node1=https://x.x.10.1:2380,etcd-node2=https://x.x.10.2:2380,etcd-node3=https://x.x.10.3:2380" \
        --initial-cluster-token k8s_etcd \
        --initial-advertise-peer-urls https://x.x.10.1:2380 \
        --data-dir=/var/lib/etcd

Note that the etcd-node1 name, -initial-advertise-peer-urls parameter will vary on each node.

Restart etcd for each Etcd node
1

systemctl restart etcd
View etcd status per Etcd node
1

systemctl status etcd

2. Update K8s certificate

Backing up certificates

`1`	`cp -R /etc/kubernetes/ /etc/kubernetes-bak`

Each Kubernetes node replaces the IP address in the associated file

# master
export oldip1=x.x.x.1 
export newip1=x.x.10.1 

export oldip2=x.x.x.2
export newip2=x.x.10.2 

export oldip3=x.x.x.3 
export newip3=x.x.10.3 

# node
export oldip4=x.x.x.4
export newip4=x.x.10.4

find /etc/kubernetes -type f | xargs sed -i "s/$oldip1/$newip1/"
find /etc/kubernetes -type f | xargs sed -i "s/$oldip2/$newip2/"
find /etc/kubernetes -type f | xargs sed -i "s/$oldip3/$newip3/"
find /etc/kubernetes -type f | xargs sed -i "s/$oldip4/$newip4/"

sed -i "s/$oldip1/$newip1/" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
sed -i "s/$oldip2/$newip2/" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
sed -i "s/$oldip3/$newip3/" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
sed -i "s/$oldip4/$newip4/" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

sed -i "s/$oldip1/$newip1/" /etc/kubernetes/kubeadm-config.yaml
sed -i "s/$oldip2/$newip2/" /etc/kubernetes/kubeadm-config.yaml
sed -i "s/$oldip3/$newip3/" /etc/kubernetes/kubeadm-config.yaml
sed -i "s/$oldip4/$newip4/" /etc/kubernetes/kubeadm-config.yaml

sed -i "s/$oldip1/$newip1/" /etc/hosts
sed -i "s/$oldip2/$newip2/" /etc/hosts
sed -i "s/$oldip3/$newip3/" /etc/hosts
sed -i "s/$oldip4/$newip4/" /etc/hosts

Generate a certificate at a master node

`1`	`rm -f /etc/kubernetes/pki/apiserver*`

`1`	`kubeadm init phase certs all --config /etc/kubernetes/kubeadm-config.yaml`

Each Kubernetes node distributes the generated certificates to the nodes
1

The node node does not need a key, only a crt.

3. update the Conf file of the cluster component

Generate a new configuration file in a master node

1
2

cd /etc/kubernetes
rm -f admin.conf kubelet.conf controller-manager.conf scheduler.conf

`1`	`kubeadm init phase kubeconfig all --config /etc/kubernetes/kubeadm-config.yaml`

Each Kubernetes node distributes the new configuration file to each node

Each node needs /etc/kubernetes/kubelet.conf and each master node needs /etc/kubernetes/controller-manager.conf and /etc/kubernetes/scheduler.conf.
Configure user access credentials on the node that needs to use kubectl
1

cp /etc/kubernetes/admin.conf $HOME/.kube/config

Restart the kubelet for each Kubernetes node

1
2

systemctl daemon-reload
systemctl restart kubelet

View kubelet status per Kubernetes node
1

systemctl status kubelet

4. Fix ConfigMap

Replace IP
1

kubectl -n kube-system edit cm kube-proxy
kube-proxy affects node communication. If you are using an LB or a domain as an Apiserver entry, you can also leave it out. As for kubeadm-config, it is automatically replaced in the above steps, so no additional processing is needed.

5. Summary

It is strongly recommended that you do not change the IP address of the cluster hosts. If the change is an expected host IP change, you can rebuild the cluster by backup-restore.

If it is an unintended host IP change, it is recommended to fix it in the above order:

Etcd
K8s certificate
K8s Master Node, Node Node core components
cluster ConfigMap configuration

The above mentioned content has documented the repair process. However, the site is very complicated when repairing multiple master nodes. The container was restarting continuously, during which it kept reporting port conflicts, for which I also restarted the machine once. The repair process may be imperfectly documented, but just follow the sequence, one component at a time, should not be a big problem.

Table of Contents

1. Update Etcd certificate

2. Update K8s certificate

3. update the Conf file of the cluster component

4. Fix ConfigMap

5. Summary