Cluster Mesh is a multi-cluster implementation of Cilium that helps Cilium achieve multi-Kubernetes cluster management across data centers and VPCs. Cluster Mesh has the following main features.

  1. Pod IP routing between multiple Kubernetes clusters through tunneling or direct routing without any gateway or proxy.
  2. Use standard Kubernetes service discovery mechanisms.
  3. Network policies across multiple clusters. Policies can use Kubernetes native NetworkPolicy resources or the extended CiliumNetworkPolicy CRD.
  4. Transparent encryption of all traffic communicated within and across cluster nodes.

Kubernetes Multi-Cluster

Let’s take a look at some specific usage scenarios for Cilium Cluster Mesh.

1 Usage Scenarios

1.1 High Availability

For most people, high availability is the most common usage scenario. Multiple Kubernetes clusters can be run in multiple regions or availability zones, with copies of the same services running in each cluster. In the event of an exception, requests can be failover to other clusters.

High Availability

1.2 Shared Services

Certain public infrastructure services can be shared across clusters (e.g. key management, logging, monitoring or DNS services, etc.) to avoid additional resource overhead.

Shared Services

1.3 Splitting Stateful and Stateless Services

The operational complexity of running a stateful or stateless service is very different. Stateless services are easy to scale, migrate and upgrade. Running a cluster entirely with stateless services allows the cluster to remain flexible and agile. Stateful services (e.g. MySQL, Elasticsearch, Etcd, etc.) can introduce potentially complex dependencies, and migrating stateful services usually involves the migration of storage. Running separate clusters for stateless and stateful services can isolate dependency complexity to a smaller number of clusters.

Splitting Stateful and Stateless Services

2 Architecture

The architecture of Cilium Cluster Mesh is as follows.

  • Each Kubernetes cluster maintains its own etcd cluster that keeps the state of its own cluster. State from multiple clusters is never obfuscated in the etcd of this cluster.
  • Each cluster exposes its own etcd through a set of etcd agents, and Cilium agents running in other clusters connect to the etcd agents to monitor changes.
  • Cilium uses the clustermesh-apiserver Pod to establish multi-cluster interconnections. There are two containers in the clustermesh-apiserver Pod: the apiserver container is responsible for writing multi-cluster related information to the etcd container; the etcd container (etcd agent) is used to store Cluster Mesh-related configuration information.
  • Access from one cluster to another is always read-only . This ensures that the failure domain remains unchanged, i.e. a failure in one cluster is never propagated to other clusters.

architecture of Cilium Cluster Mesh

3 Prerequisites

Kind (Kubernetes in Docker) is a tool for running local Kubernetes clusters using Docker containers. To facilitate experimentation, this article uses Kind to build a Kubernetes multicluster environment.

3.2 Environment Requirements

  • All Kubernetes worker nodes must be assigned unique IP addresses and IP routing between nodes must be reachable.
  • Each cluster must be assigned a unique Pod CIDR.
  • Cilium must use etcd as kv storage.
  • The network between clusters must be interoperable, see Firewall Rules for the specific port numbers for communication.

The configuration file for this experiment is available at: cluster_mesh.

4 Preparing the Kubernetes Environment

Prepare two Kind configuration files for building a Kubernetes cluster.

c1 Cluster configuration file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# kind-config1.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
networking:
  disableDefaultCNI: true # 禁用默认的 CNI
  podSubnet: "10.10.0.0/16"
  serviceSubnet: "10.11.0.0/16"

c2 cluster configuration file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# kind-config2.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
networking:
  disableDefaultCNI: true # 禁用默认的 CNI
  podSubnet: "10.20.0.0/16"
  serviceSubnet: "10.21.0.0/16"

Use the kind create cluster command to create two Kubernetes clusters.

1
2
kind create cluster --config kind-config1.yaml --name c1
kind create cluster --config kind-config2.yaml --name c2

kind create cluster

View two Kubernetes clusters.

1
2
kubectl get node --context kind-c1 -o wide
kubectl get node --context kind-c2 -o wide

kubectl get node

5 Installing Cilium

Add Helm Repo.

1
helm repo add cilium https://helm.cilium.io/

Install Cilium on the c1 cluster, using the -kube-context parameter to specify a different cluster context. Each cluster must be assigned a unique name and cluster id, the -cluster.id parameter specifies the cluster id in the range 1-255, and the -cluster.name parameter specifies the cluster name.

1
2
3
4
5
helm install --kube-context kind-c1 cilium cilium/cilium --version 1.11.4 \
  --namespace kube-system \
  --set ipam.mode=kubernetes \
  --set cluster.id=1 \
  --set cluster.name=cluster1

Install Cilium on the c2 cluster.

1
2
3
4
5
helm install --kube-context kind-c2 cilium cilium/cilium --version 1.11.4 \
  --namespace kube-system \
  --set ipam.mode=kubernetes \
  --set cluster.id=2 \
  --set cluster.name=cluster2

View Cilium Pod status.

1
2
kubectl --context kind-c1 get pod -A
kubectl --context kind-c2 get pod -A

kubectl –context

View Cilium status.

1
2
cilium status --context kind-c1
cilium status --context kind-c2

cilium status

6 Install Metallb (optional)

In the 7 Enabling Cluster Mesh section, the Service type used to publish the clustermesh-apiserver service is described. In Kubernetes clusters provided by public clouds, LoadBalancer type services are usually distributed through load balancing devices in public clouds (e.g., AWS’s ELB, Aliyun’s SLB, etc.). In a private environment, you can use MetalLB to implement it.

Prepare the Metallb configuration files for both clusters. c1 Cluster configuration file. Note that the assigned network should be in the same network segment as the node IPs.

1
2
3
4
5
6
7
8
# metallb-config1.yaml
configInline:
  peers:
  address-pools:
  - name: default
    protocol: layer2
    addresses:
    - 172.22.0.50-172.22.0.100

c2 cluster configuration file.

1
2
3
4
5
6
7
configInline:
  peers:
  address-pools:
  - name: default
    protocol: layer2
    addresses:
    - 172.22.0.101-172.22.0.150

Use the following command to deploy Metallb in the c1 and c2 clusters.

1
2
3
4
5
6
7
8
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install --kube-context kind-c1 metallb bitnami/metallb \
  --namespace kube-system \
  -f metallb-config1.yaml
  
helm install --kube-context kind-c2 metallb bitnami/metallb \
  --namespace kube-system \
  -f metallb-config2.yaml

View Metallb Pod status.

Metallb Pod status

7 Enabling Cluster Mesh

Use the cilium clustermesh enable command to enable Cluster Mesh on the c1 cluster.

  • The -create-ca parameter indicates that a CA certificate is automatically created, which needs to be shared between clusters to ensure that mTLS works properly across clusters.
  • The -service-type parameter specifies the way to publish the clustermesh-apiserver service, which has the following 3 ways.
    • LoadBalancer (recommended): Use a LoadBalancer type Service to publish the service, which allows the use of a stable LoadBalancer IP and is usually the best choice.
    • NodePort : Use NodePort type Service to publish services, if a node disappears, Cluster Mesh will have to reconnect to another node, which may cause network outages.
    • ClusterIP : Use a ClusterIP type Service to publish services, which requires that the ClusterIP is routable between clusters.
1
cilium clustermesh enable --create-ca --context kind-c1 --service-type LoadBalancer

Executing the command will deploy the clustermesh-apiserver service in the cluster and generate the relevant necessary certificates.

cilium clustermesh enable

The created CA is stored in the cilium-ca Secret under the kube-system Namespace.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ kubectl --context kind-c1 get secret -n kube-system cilium-ca -o yaml
apiVersion: v1
data:
  ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNFekNDQWJxZ0F3SUJBZ0lVZmVPNHlYbVZSSU1ZZVppSjZyODJ6L05FejBVd0NnWUlLb1pJemowRUF3SXcKYURFTE1Ba0dBMVVFQmhNQ1ZWTXhGakFVQmdOVkJBZ1REVk5oYmlCR2NtRnVZMmx6WTI4eEN6QUpCZ05WQkFjVApBa05CTVE4d0RRWURWUVFLRXdaRGFXeHBkVzB4RHpBTkJnTlZCQXNUQmtOcGJHbDFiVEVTTUJBR0ExVUVBeE1KClEybHNhWFZ0SUVOQk1CNFhEVEl5TURVd09UQXpNemt3TUZvWERUSTNNRFV3T0RBek16a3dNRm93YURFTE1Ba0cKQTFVRUJoTUNWVk14RmpBVUJnTlZCQWdURFZOaGJpQkdjbUZ1WTJselkyOHhDekFKQmdOVkJBY1RBa05CTVE4dwpEUVlEVlFRS0V3WkRhV3hwZFcweER6QU5CZ05WQkFzVEJrTnBiR2wxYlRFU01CQUdBMVVFQXhNSlEybHNhWFZ0CklFTkJNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVTQVNHRERDdnhsUmpYNTEwMEpCQnoxdXIKb29sMktUNVh6MUNYS1paVk5Pc1M5ZmVrOEJUOTRqTXpZcHpsZW5hZXdwczVDZGhWckkvSU9mK2RtaTR3UjZOQwpNRUF3RGdZRFZSMFBBUUgvQkFRREFnRUdNQThHQTFVZEV3RUIvd1FGTUFNQkFmOHdIUVlEVlIwT0JCWUVGTlVwCjBBRVROZ0JHd2ZEK0paRDFWV2w2elNvVk1Bb0dDQ3FHU000OUJBTUNBMGNBTUVRQ0lHZUszUklreUJzQnFxL0MKdzRFTU9nMjk1T244WDFyYVM5QVZMZmlzS2JJVEFpQW5Da3NQTm9BYmZVZ1lyMkVGaFZZaDU0bjlZMVlyU0NlZAprOEZ3Nnl2MWNBPT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
  ca.key: LS0tLS1CRUdJTiBFQyBQUklWQVRFIEtFWS0tLS0tCk1IY0NBUUVFSU9uWG9WTmhIdEJ0TTFaMFFlTWE5UWlLV1QvdXVNMk9jUXNmU252bXEwL2RvQW9HQ0NxR1NNNDkKQXdFSG9VUURRZ0FFU0FTR0REQ3Z4bFJqWDUxMDBKQkJ6MXVyb29sMktUNVh6MUNYS1paVk5Pc1M5ZmVrOEJUOQo0ak16WXB6bGVuYWV3cHM1Q2RoVnJJL0lPZitkbWk0d1J3PT0KLS0tLS1FTkQgRUMgUFJJVkFURSBLRVktLS0tLQo=
kind: Secret
metadata:
  creationTimestamp: "2022-05-09T03:44:03Z"
  name: cilium-ca
  namespace: kube-system
  resourceVersion: "20625"
  uid: 7e4b2f21-815d-4191-974b-316c629e325c
type: Opaque

Import the Cilium CA certificate of cluster c1 to cluster c2.

1
2
3
4
# 将 c1 集群的 Cilium CA 证书导出
kubectl get secret --context kind-c1 -n kube-system cilium-ca -o yaml > cilium-ca.yaml
# 将 CA 证书导入 c2 集群
kubectl apply -f cilium-ca.yaml --context kind-c2

Enable Cluster Mesh on the c2 cluster.

1
cilium clustermesh enable --context kind-c2 --service-type LoadBalancer

Enable Cluster Mesh on the c2 cluster

View the clustermesh-apiserver Pod status for clusters c1 and c2.

Looking at the clustermesh-apiserver service for clusters c1 and c2, you can see that the Servie type is LoadBalancer, which is the IP address assigned by Metallb.

1
2
kubectl --context kind-c1 get svc -A 
kubectl --context kind-c2 get svc -A 

kubectl –context

View Cilium status.

1
2
cilium status --context kind-c1
cilium status --context kind-c2

View Cilium status

Check the Cluster Mesh status of the c1 and c2 clusters. Both clusters are currently successfully enabled for Cluster Mesh, but are not yet connected to each other.

1
2
cilium clustermesh status --context kind-c1
cilium clustermesh status --context kind-c2

cilium clustermesh status

8 Connecting to a cluster

Execute the cilium clustermesh connect command on the c1 cluster to connect to the c2 cluster. This only needs to be executed on one cluster.

1
cilium clustermesh connect --context kind-c1 --destination-context kind-c2

cilium clustermesh connect

Check the Cilium Cluster Mesh status, at which point the c1 and c1 clusters have established a Cluster Mesh connection.

1
2
cilium clustermesh status --context kind-c1
cilium clustermesh status --context kind-c2

cilium clustermesh status

Now that we have successfully established the interconnection between clusters, let’s verify the load balancing and network policies in Cluster Mesh mode.

9 Load balancing

9.1 Global Load Balancing

Deploy two applications in a cluster, where x-wing is the client and rebel-base is the server, and require global load balancing for the rebel-base service. You need to ensure that the rebel-base service in each cluster has the same name and is in the same namespace, then add io.cilium/global-service: "true" to declare it as a global service so that Cilium will automatically perform load balancing on the Pods in both clusters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: v1
kind: Service
metadata:
  name: rebel-base
  annotations:
    io.cilium/global-service: "true" # 启用全局负载均衡
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    name: rebel-bas

Create application services in c1 and c2 clusters.

1
2
kubectl apply -f cluster1.yaml --context kind-c1
kubectl apply -f cluster2.yaml --context kind-c2

Check out the service.

1
2
3
4
kubectl --context kind-c1 get pod
kubectl --context kind-c1 get svc
kubectl --context kind-c2 get pod
kubectl --context kind-c2 get svc

Check out the service

Accessing the rebel-base service from either cluster, you can see that traffic is distributed to both clusters.

1
for i in {1..10}; do kubectl exec --context kind-c1 -ti deployment/x-wing -- curl rebel-base; done

kubectl

9.2 Disable global service sharing

By default, global services will be load balanced across backends in multiple clusters. If you want to disable services from this cluster from being shared to other clusters, you can do so by setting the io.cilium/shared-services: "false" annotation.

1
2
kubectl annotate service rebel-base \
io.cilium/shared-service="false" --overwrite --context kind-c1

The rebel-base service is accessible to both clusters in the c1 cluster.

1
for i in {1..10}; do kubectl exec --context kind-c1 -ti deployment/x-wing -- curl rebel-base; done

rebel-base service

But then the c2 cluster will only be able to access the rebel-base service of this cluster.

1
for i in {1..10}; do kubectl exec --context kind-c2 -ti deployment/x-wing -- curl rebel-base; done

c2 cluster

Remove the annotation io.cilium/shared-service from the c1 cluster rebel-base service.

1
kubectl annotate service rebel-base io.cilium/shared-service- --context kind-c1

The c2 cluster can now re-access the rebel-base service for both clusters.

1
for i in {1..10}; do kubectl exec --context kind-c2 -ti deployment/x-wing -- curl rebel-base; done

rebel-base service

10 Network Policy

Create a CiliumNetworkPolicy policy to allow only Pods in cluster c1 with the x-wing tag to access Pods in cluster c2 with the rebel-base tag. cluster names are specified in the 5 Installing Cilium chapter with the -cluster-name parameter, and can also be found in cilium- config Configmap. In addition to the traffic between application services, care should be taken to release the DNS traffic, otherwise it cannot be accessed directly by the Service name.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# networkpolicy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: "allow-dns"
spec:
  endpointSelector: {}
  egress:
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
          rules:
            dns:
              - matchPattern: "*"
---
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "allow-cross-cluster"
spec:
  description: "Allow x-wing in cluster1 to contact rebel-base in cluster2"
  endpointSelector:
    matchLabels:
      name: x-wing
      io.cilium.k8s.policy.cluster: cluster1
  egress:
  - toEndpoints:
    - matchLabels:
        name: rebel-base
        io.cilium.k8s.policy.cluster: cluster2

Kubernetes network policies are not automatically published to all clusters; you need to issue NetworkPolicy or CiliumNetworkPolicy on each cluster.

1
2
kubectl --context kind-c1 apply -f networkpolicy.yaml
kubectl --context kind-c2 apply -f networkpolicy.yaml

When you access the rebel-base service on cluster c1, you can see that only requests distributed to cluster c2 are successfully responded to.

1
kubectl exec --context kind-c1 -ti deployment/x-wing -- curl rebel-base

kubectl exec

11 Troubleshooting

The following exception was encountered while enabling Cluster Mesh.

1
cilium clustermesh status --context kind-c1

cilium clustermesh status

Checking the Pod information reveals that the pulled image does not exist.

1
kubectl --context kind-c1 describe pod -n kube-system  clustermesh-apiserver-754c5479dd-zsg8t

Checking the Pod information

I went to the Cilium image repository and found that the sha256 value behind the image did not match.

image repository

Edit the image of clustermesh-apiserver Deployment and remove the shasum value after the image version.

1
2
kubectl edit --context kind-c1 deployment -n kube-system clustermesh-apiserver
kubectl edit --context kind-c2 deployment -n kube-system clustermesh-apiserver

Edit the image of clustermesh-apiserver Deployment