Service discovery is an important feature of K8s, and there are two ways to do this: either by injecting the svc ClusterIP into the pod as an environment variable, or by using DNS, which has replaced kube dns as the built-in DNS server since version 1.13. In this article, we will briefly analyze coreDNS.

K8s DNS Policies

There are four types of DNS policies for Pods in Kubernetes.

  1. Default: Pod inherits the DNS configuration on the host where it resides.
  2. ClusterFirst: the default setting for K8s; first query in the coreDNS of K8s cluster configuration, and then query in the upstream nameserver of the inherited host if you can’t find it.
  3. ClusterFirstWithHostNet: for Pods with network configuration as hostNetwork, the DNS configuration rules are the same as ClusterFirst.
  4. None: Ignore the DNS configuration of the K8s environment and only recognize the Pod’s dnsConfig settings.

Here’s a quick overview of how coreDNS resolves domain names.

resolv.conf file analysis

When deploying a pod, if it uses the DNS of the K8s cluster, the kubelet will initialize its DNS resolution configuration to that of the cluster when it starts the pause container.

For example, if I create a deployment called my-nginx, the resolv.conf file in the pod is as follows.

1
2
3
4
[root@localhost ~]# kubectl exec -it my-nginx-b67c7f44-hsnpv cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

When pods in the cluster access each other with svc name, the domain name is resolved according to the DNS configuration in the resolv.conf file, and the following is a breakdown of the process.

The process of domain name resolution

The pod resolv.conf file has three main sections, nameserver, search and option, which can be specified by K8s or customized by the pod.spec.dnsConfig field.

nameserver

The first line of the resolv.conf file, nameserver, specifies the IP of the DNS service, in this case the clusterIP of coreDNS.

1
2
[root@localhost ~]# kubectl -n kube-system get svc |grep dns
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   32d

This means that all domain names resolve through the virtual IP 10.96.0.10 of coreDNS, whether they are internal to Kubernetes or external to it.

search domain

The second line of the resolv.conf file specifies the DNS search domain. When resolving a domain name, the domain names to be accessed are brought into the search domain in order to perform a DNS lookup.

For example, if I want to access a service with the domain name your-nginx in the pod, the order of the DNS lookups will be as follows.

1
your-nginx.default.svc.cluster.local. -> your-nginx.svc.cluster.local. -> your-nginx.cluster.local.    

until it is checked.

options

The third line of the resolv.conf file specifies additional items, most often dnots. dnots means that if the query domain contains a dot “.” The default configuration in K8s is 5.

In other words, if I visit a.b.c.e.f.g, then the domain name lookup order is as follows.

1
a.b.c.e.f.g. -> a.b.c.e.f.g.default.svc.cluster.local. -> a.b.c.e.f.g.svc.cluster.local. -> a.b.c.e.f.g.cluster.local.

If I visit a.b.c.e, then the domain lookup order is as follows.

1
a.b.c.e.default.svc.cluster.local. -> a.b.c.e.svc.cluster.local. -> a.b.c.e.cluster.local. -> a.b.c.e.

Communication between pods

After understanding the domain resolution process, let’s take a look at the communication between pods.

Access via svc

As we all know, in K8s, when pods access each other via svc, they go through DNS domain name resolution and then get the ip communication. The full name of K8s domain is "<service-name>. <namespace>.svc.cluster.local", and we usually just use svc name as the domain name to access the pod, which is not hard to understand by the domain name resolution process above.

Let’s look at an example. There are two deployments, one called busybox under the namespace default, and one called your-nginx under the namespace hdls, with the same svc name. We try to access your-nginx in busybox.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
[root@localhost ~]# kubectl get po
NAME                           READY   STATUS    RESTARTS   AGE
busybox-5bbb5d7ff7-dh68j       1/1     Running   0          8m35s
[root@localhost ~]#
[root@localhost ~]# kubectl exec -it busybox-5bbb5d7ff7-dh68j sh
/ # wget your-nginx
wget: bad address 'your-nginx'
/ #
/ # wget your-nginx.hdls
Connecting to your-nginx.hdls (10.100.3.148:80)
saving to 'index.html'
index.html           100% |*****************************************************|   612  0:00:00 ETA
'index.html' saved
/ #
[root@localhost ~]# kubectl -n hdls get svc
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
your-nginx   ClusterIP   10.100.3.148   <none>        80/TCP    14m

As you can see, when accessing directly with your-nginx, it prompts bad address, which means that the domain name is wrong, because all the search domains under different namespace have been searched and still cannot be found; when accessing with your-nginx.hdls, it resolves to 10.100.3.148, which is the ClusterIP of your-nginx.

So, when pods under different namespace are accessed via svc, you need to add . <namespace>.

Hostname and subdomain of pod

In K8s, if you don’t specify the hostname of a pod, it defaults to pod.metadata.name, which can be customized via the spec.hostname field; you can also set the subdomain of a pod via the spec.subdomain field. For example, the following example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  hostname: domain-test
  subdomain: subdomain-test
  containers:
  - image: nginx
    name: nginx
---
apiVersion: v1
kind: Service
metadata:
  name: subdomain-test
spec:
  selector:
    name: nginx
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP

You can check the hostname and hosts file of this pod.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
[root@localhost ~]# kubectl get po -owide
NAME                           READY   STATUS    RESTARTS   AGE     IP             NODE           NOMINATED NODE   READINESS GATES
busybox-5bbb5d7ff7-dh68j       1/1     Running   0          112m    10.244.1.246   172-16-105-2   <none>           <none>
nginx                          1/1     Running   0          2m      10.244.1.253   172-16-105-2   <none>           <none>
[root@localhost ~]# kubectl exec -it nginx bash
root@domain-test:/# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
fe00::0	ip6-mcastprefix
fe00::1	ip6-allnodes
fe00::2	ip6-allrouters
10.244.1.253	domain-test.subdomain-test.default.svc.cluster.local	domain-test
root@domain-test:/#

Access this pod in the busybox container.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[root@localhost ~]# kubectl exec -it busybox-5bbb5d7ff7-dh68j sh
/ # wget domain-test.subdomain-test
Connecting to domain-test.subdomain-test (10.244.1.253:80)
saving to 'index.html'
index.html           100% |*****************************************************|   612  0:00:00 ETA
'index.html' saved
/ #
/ # wget subdomain-test
Connecting to subdomain-test (10.108.213.70:80)
wget: can't open 'index.html': File exists
/ #

As you can see, when accessing domain-test.subdomain-test, it resolves to 10.244.1.253, which is the pod ip of nginx, not the clusterIP; while when accessing subdomain-test, it resolves to 10.108.213.70, which is the clusterIP, which is the normal svc name route.

coreDNS Corefile file

CoreDNS implements application plugging, allowing users to select the required plugins to compile into an executable; the CoreDNS configuration file is in the form of a Corefile, here is an example of a coreDNS configMap.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[root@localhost ~]# kubectl -n kube-system get cm coredns -oyaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }    
kind: ConfigMap
metadata:
  creationTimestamp: "2019-06-10T03:19:01Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "3380134"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: 7e845ca2-8b2e-11e9-b4eb-005056b40224

Corefile file analysis

Part 1.

1
2
3
4
5
kubernetes cluster.local in-addr.arpa ip6.arpa {
   pods insecure
   upstream
   fallthrough in-addr.arpa ip6.arpa
}

The domain names that specify the cluster.local suffix are all internal kubernetes domains. coredns listens to service changes to maintain domain relationships, so the cluster.local related domains are resolved here.

Part 2.

1
proxy . /etc/resolv.conf

proxy means no record found in coredns, then it goes to the nameserver in /etc/resolv.conf to request resolution, while /etc/resolv.conf in the coredns container is inherited from the host. The practical effect is that if the domain name is not internal to k8s, it will go to the default dns server to request resolution and return it to the coredns requester.

Third part.

prometheus : CoreDNS is monitored at: http://localhost:9153/metrics, satisfying the Prometheus format.

cache : allows caching

loop : detects a simple forwarding loop and stops the CoreDNS process if a loop is found.

reload : allows the Corefile configuration to be updated automatically. The changes take effect two minutes after the ConfigMap is changed

loadbalance : This is a cyclic DNS load balancer that randomizes the order of A, AAAA and MX records in the answer.

Specifying hosts

Sometimes the service of a domain is outside the cluster and I want to access it inside the cluster, we can specify the hosts in the corefile to achieve this. This is done by adding the domain name and the corresponding ip to the corefile as a hosts plugin, as follows.

1
2
3
4
hosts {
    10.244.1.245 other-company.com
    fallthrough
}

Where 10.244.1.245 is the pod ip of your-nginx, and then the service other-company.com is accessed from the busybox pod above, as follows.

1
2
3
4
5
6
7
[root@localhost ~]# kubectl exec -it busybox-5bbb5d7ff7-dh68j sh
/ # wget other-company.com
Connecting to other-company.com (10.244.1.245:80)
saving to 'index.html'
index.html           100% |*****************************************************|   612  0:00:00 ETA
'index.html' saved
/ #