1. Introduction

In the process of building a Kubernetes cluster, the first thing to do is to fix the Etcd cluster. Although the kubeadm tool already provides an automatic way to build the Etcd cluster that is bound to the master node by default, I personally always manually build the Etcd cluster on the host; because this thing is so important, it is no exaggeration to say that It is no exaggeration to say that we can recover from all kubernetes component crashes after a certain period of time, but once the Etcd cluster is gone then the Kubernetes cluster is really gone too.

A long time ago I created the edep tool to achieve the auxiliary deployment of Etcd clusters, and then because our underlying system coupled with Ubuntu, I created the etcd-deb project to automatically play the deb package to install directly; recently I shopped around for Kubernetes-related projects and found a project similar to my edep, etcdadm, and tried it out for a while “it smells good”.

2. Installation

The etcdadm project is written in go, so you only need to download the binaries to use it:

1
2
wget https://github.com/kubernetes-sigs/etcdadm/releases/download/v0.1.3/etcdadm-linux-amd64
chmod +x etcdadm-linux-amd64

3. Usage

3.1. Start boot node

Like kubeadm, etcdadm also starts the first node and then joins subsequent nodes directly; the first node is started by simply executing the etcdadm init command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
k1.node ➜  ~ ./etcdadm-linux-amd64 init
INFO[0000] [install] extracting etcd archive /var/cache/etcdadm/etcd/v3.3.8/etcd-v3.3.8-linux-amd64.tar.gz to /tmp/etcd664686683
INFO[0001] [install] verifying etcd 3.3.8 is installed in /opt/bin/
INFO[0001] [certificates] creating PKI assets
INFO[0001] creating a self signed etcd CA certificate and key files
[certificates] Generated ca certificate and key.
INFO[0001] creating a new server certificate and key files for etcd
[certificates] Generated server certificate and key.
[certificates] server serving cert is signed for DNS names [k1.node] and IPs [127.0.0.1 172.16.10.21]
INFO[0002] creating a new certificate and key files for etcd peering
[certificates] Generated peer certificate and key.
[certificates] peer serving cert is signed for DNS names [k1.node] and IPs [172.16.10.21]
INFO[0002] creating a new client certificate for the etcdctl
[certificates] Generated etcdctl-etcd-client certificate and key.
INFO[0002] creating a new client certificate for the apiserver calling etcd
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] valid certificates and keys now exist in "/etc/etcd/pki"
INFO[0006] [health] Checking local etcd endpoint health
INFO[0006] [health] Local etcd endpoint is healthy
INFO[0006] To add another member to the cluster, copy the CA cert/key to its certificate dir and run:
INFO[0006]      etcdadm join https://172.16.10.21:2379

From the command line output, you can see the log output related to the different phases of etcdadm; you can specify some specific parameters during the init command to override the default behavior, such as version number, installation directory, etc:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
k1.node ➜  ~ ./etcdadm-linux-amd64 init --help
Initialize a new etcd cluster

Usage:
  etcdadm init [flags]

Flags:
      --certs-dir string                    certificates directory (default "/etc/etcd/pki")
      --disk-priorities stringArray         Setting etcd disk priority (default [Nice=-10,IOSchedulingClass=best-effort,IOSchedulingPriority=2])
      --download-connect-timeout duration   Maximum time in seconds that you allow the connection to the server to take. (default 10s)
  -h, --help                                help for init
      --install-dir string                  install directory (default "/opt/bin/")
      --name string                         etcd member name
      --release-url string                  URL used to download etcd (default "https://github.com/coreos/etcd/releases/download")
      --server-cert-extra-sans strings      optional extra Subject Alternative Names for the etcd server signing cert, can be multiple comma separated DNS names or IPs
      --skip-hash-check                     Ignore snapshot integrity hash value (required if copied from data directory)
      --snapshot string                     Etcd v3 snapshot file used to initialize member
      --version string                      etcd version (default "3.3.8")

Global Flags:
  -l, --log-level string   set log level for output, permitted values debug, info, warn, error, fatal and panic (default "info")

3.2. Other nodes join

After the first node is started, copy the cluster ca certificate to other nodes and execute etcdadm join ENDPOINT_ADDRESS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Copy ca certificate
k1.node ➜  ~ rsync -avR /etc/etcd/pki/ca.* 172.16.10.22:/
root@172.16.10.22's password:
sending incremental file list
/etc/etcd/
/etc/etcd/pki/
/etc/etcd/pki/ca.crt
/etc/etcd/pki/ca.key

sent 2,932 bytes  received 67 bytes  856.86 bytes/sec
total size is 2,684  speedup is 0.89

# 执行 join
k2.node ➜  ~ ./etcdadm-linux-amd64 join https://172.16.10.21:2379
INFO[0000] [certificates] creating PKI assets
INFO[0000] creating a self signed etcd CA certificate and key files
[certificates] Using the existing ca certificate and key.
INFO[0000] creating a new server certificate and key files for etcd
[certificates] Generated server certificate and key.
[certificates] server serving cert is signed for DNS names [k2.node] and IPs [172.16.10.22 127.0.0.1]
INFO[0000] creating a new certificate and key files for etcd peering
[certificates] Generated peer certificate and key.
[certificates] peer serving cert is signed for DNS names [k2.node] and IPs [172.16.10.22]
INFO[0000] creating a new client certificate for the etcdctl
[certificates] Generated etcdctl-etcd-client certificate and key.
INFO[0001] creating a new client certificate for the apiserver calling etcd
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] valid certificates and keys now exist in "/etc/etcd/pki"
INFO[0001] [membership] Checking if this member was added
INFO[0001] [membership] Member was not added
INFO[0001] Removing existing data dir "/var/lib/etcd"
INFO[0001] [membership] Adding member
INFO[0001] [membership] Checking if member was started
INFO[0001] [membership] Member was not started
INFO[0001] [membership] Removing existing data dir "/var/lib/etcd"
INFO[0001] [install] extracting etcd archive /var/cache/etcdadm/etcd/v3.3.8/etcd-v3.3.8-linux-amd64.tar.gz to /tmp/etcd315786364
INFO[0003] [install] verifying etcd 3.3.8 is installed in /opt/bin/
INFO[0006] [health] Checking local etcd endpoint health
INFO[0006] [health] Local etcd endpoint is healthy

4. Detail Analysis

4.1. Default Configuration

At present etcdadm does not yet support configuration files, currently all the default configuration is stored in constants.go, which contains the default installation location, systemd configuration, environment variable configuration, etc., limited to space, please check the code yourself; the following is a brief introduction to some of the configuration just must

4.1.1. etcdctl

etcdctl is installed by default in the /opt/bin directory, and you will also find an etcdctl.sh script in that directory, which will automatically read the etcdctl configuration file (/etc/etcd/etcdctl.env), so it is recommended to use this script instead of the etcdctl command

4.1.2. Data Catalog

The default data directory is stored in /var/lib/etcd. etcdadm does not provide any configurable way yet, but you can of course change the source code yourself.

4.1.3. Configuration file

There are two configuration files, /etc/etcd/etcdctl.env for /opt/bin/etcdctl.sh and /etc/etcd/etcd.env for systemd to read and start etcd server

4.2. Join Process

In fact, a long time ago due to my own deployment method led to a mistake that I have always understood, I always thought that etcd server certificate should contain all server addresses, of course I don’t know how this idea came about, but when I read the following Join operation source code, I suddenly realized “why should it contain all? Why don’t we just include the current server?” Of course, I always understand the understanding of HTTPS certificate, but it is strange that I don’t know how I came up with this idea (haha, I find it unbelievable myself)…

  • Since the ca certificate is copied in advance, etcdadm will issue all the certificates it needs with this ca certificate before the join starts.
  • Next, etcdadmin creates the client with the etcdctl-etcd-client certificate, and then calls MemberAdd to add the new cluster
  • Finally the usual download and install + start is done

4.3. Current shortfall

Currently etcdadm is largely available in production, but there are still some shortcomings:

  • No configuration file support, many things can not be customized
  • join to join the cluster is done in the internal api, not persisted to the physical configuration file, subsequent reconstruction may forget the node ip
  • The cluster certificate does not support auto-renewal, the default certificate is 1 year and can easily expire
  • The download action calls the system command (curl), which is a bit dependent.
  • The log format is a bit unfriendly, such as level and date

Reference https://mritd.com/2020/08/19/use-etcdadm-to-build-etcd-cluster-in-3-minutes/