A few days ago we used NodeLocal DNSCache to solve the 5 second timeout problem with CoreDNS, and the cluster DNS resolution performance was significantly improved. However, today we encountered a major pitfall, when we were doing DevOps experiments, the tools were using custom domains, so we needed to add custom domain name resolution to access each other, which we could solve by adding hostAlias to Pods, but when using Jenkins’ Kubernetes plugin, this parameter was not supported. This parameter is not supported when using Jenkins’ Kubernetes plugin and needs to be defined using YAML, which is a bit of a pain, so we thought we’d add an A record via CoreDNS to solve this problem.

Normally we just need to add the hosts plugin to the ConfigMap of CoreDNS and it will work.

 1 2 3 4  hosts { 10.151.30.11 git.k8s.local fallthrough } 

However, after the configuration is complete, the custom domain name never resolves.

 1 2 3 4 5 6 7  $kubectl run -it --image busybox:1.28.4 test --restart=Never --rm /bin/sh If you don't see a command prompt, try pressing enter. / # nslookup git.k8s.local Server: 169.254.20.10 Address 1: 169.254.20.10 nslookup: can't resolve 'git.k8s.local'  This is a bit strange, doesn’t the hosts plugin work this way? After some checking, I was convinced that this was the right way to configure it. Then I turned on CoreDNS logging to filter the resolution logs for the above domain name. We can see that we walked through the search field, but did not get the correct parsing result, which is a bit puzzling. After tossing around a bit, it occurred to me that we have NodeLocal DNSCache enabled in the cluster, could this be the cause of the problem? Isn’t this the component that forwards queries to CoreDNS when the resolution doesn’t hit? To verify this, let’s test the resolution directly using the CoreDNS address: NodeLocal DNSCache.  1 2 3 4 5 6  / # nslookup git.k8s.local 10.96.0.10 Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: git.k8s.local Address 1: 10.151.30.11 git.k8s.local  It was found to be correct, which means that there is nothing wrong with the CoreDNS configuration, and the problem must be caused by the NodeLocal DNSCache, which was found to be a direct failure using the LocalDNS address (169.254.20.10).  1 2 3 4 5  / # nslookup git.k8s.local 169.254.20.10 Server: 169.254.20.10 Address 1: 169.254.20.10 nslookup: can't resolve 'git.k8s.local'  At this point it’s time to look at the LocalDNS Pod logs:   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56  $ kubectl logs -f node-local-dns-bb84m -n kube-system ...... 2020/05/14 05:30:21 [INFO] Updated Corefile with 0 custom stubdomains and upstream servers /etc/resolv.conf 2020/05/14 05:30:21 [INFO] Using config file: cluster.local:53 { errors cache { success 9984 30 denial 9984 5 } reload loop bind 169.254.20.10 10.96.0.10 forward . 10.96.207.156 { force_tcp } prometheus :9253 health 169.254.20.10:8080 } in-addr.arpa:53 { errors cache 30 reload loop bind 169.254.20.10 10.96.0.10 forward . 10.96.207.156 { force_tcp } prometheus :9253 } ip6.arpa:53 { errors cache 30 reload loop bind 169.254.20.10 10.96.0.10 forward . 10.96.207.156 { force_tcp } prometheus :9253 } .:53 { errors cache 30 reload loop bind 169.254.20.10 10.96.0.10 forward . /etc/resolv.conf { force_tcp } prometheus :9253 } ...... [INFO] plugin/reload: Running configuration MD5 = 3e3833f9361872f1d34bc97155f952ca CoreDNS-1.6.7 linux/amd64, go1.11.13, 

Analyzing the LocalDNS configuration information above, 10.96.0.10 is the Service ClusterIP of CoreDNS, 169.254.20.10 is the IP address of LocalDNS, and 10.96.207.156 is a new Service ClusterIP created by LocalDNS This Service is associated with the same list of CoreDNS Endpoints as CoreDNS.

A closer look reveals that cluster.local, in-addr.arpa and ip6.arpa are forwarded to 10.96.207.156 via forward, i.e. to CoreDNS for resolution, while the others are forward . /etc/resolv.conf through the resolv.conf file, which reads as follows.

 1 2 3  nameserver 169.254.20.10 search default.svc.cluster.local svc.cluster.local cluster.local options ndots:5 

So when we resolve the domain git.k8s.local we need to go through the search domain, while the domain cluster.local is directly forwarded to CoreDNS for resolution, CoreDNS naturally does not resolve these days records. So isn’t it natural to think that we can just configure the hosts plugin on the LocalDNS side? This should be exactly the right idea:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  $kubectl edit cm node-local-dns -n kube-system ...... .:53 { errors hosts { # 添加 A 记录 10.151.30.11 git.k8s.local fallthrough } cache 30 reload loop bind 169.254.20.10 10.96.0.10 forward . __PILLAR__UPSTREAM__SERVERS__ { force_tcp } prometheus :9253 } ......  After the update is complete, we can manually rebuild the NodeLocalDNS Pod and find that the NodeLocalDNS Pod fails to start, with the following error message.  1  no action found for directive 'hosts' with server type 'dns'  It turns out that the hosts plugin is not supported at all. Then we have to go to CoreDNS to resolve it, so this time we need to change forward . /etc/resolv.conf to forward . 10.96.207.156, which will go to CoreDNS, and make the following changes in the ConfigMap of NodeLocalDNS.   1 2 3 4 5 6 7 8 9 10 11 12 13 14  $ kubectl edit cm node-local-dns -n kube-system ...... .:53 { errors cache 30 reload loop bind 169.254.20.10 10.96.0.10 forward . __PILLAR__CLUSTER__DNS__ { force_tcp } prometheus :9253 } ...... 

Once the same changes are made, the NodeLocalDNS pod will need to be rebuilt for the changes to take effect.

The __PILLAR__CLUSTER__DNS__ and __PILLAR__UPSTREAM__SERVERS__ parameters are automatically configured in mirror 1.15.6 and above, and the corresponding values are derived from kube-dns ConfigMap and the custom Upstream Server address.

Now let’s go back and test that the custom domain name resolves properly.

 1 2 3 4 5 6  / # nslookup git.k8s.local Server: 169.254.20.10 Address 1: 169.254.20.10 Name: git.k8s.local Address 1: 10.151.30.11 git.k8s.local 

For those using NodeLocalDNS be aware of this issue, if the hosts or rewrite plugins are not working, this is basically the cause of the problem. The best way to troubleshoot problems is always to analyze them through logs.