Redundant DNS lookups

Some applications that need to resolve external DNS domains, when running in a container, if we catch packets in the container’s network namespace for dns messages (udp port 53), we may find that several redundant attempts are made before they resolve correctly.

Here are the packets I grabbed while ping google.com in the container’s network namespace.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
sudo nsenter -t 3885 -n tcpdump -i eth0 udp port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:09:11.917900 IP 10.244.2.202.38697 > 10.96.0.10.domain: 11858+ A? google.com.default.svc.cluster.local. (54)
10:09:11.918847 IP 10.96.0.10.domain > 10.244.2.202.38697: 11858 NXDomain*- 0/1/0 (147)
10:09:11.922468 IP 10.244.2.202.38697 > 10.96.0.10.domain: 15573+ AAAA? google.com.default.svc.cluster.local. (54)
10:09:11.923001 IP 10.96.0.10.domain > 10.244.2.202.38697: 15573 NXDomain*- 0/1/0 (147)
10:09:11.923248 IP 10.244.2.202.43230 > 10.96.0.10.domain: 62042+ A? google.com.svc.cluster.local. (46)
10:09:11.923828 IP 10.96.0.10.domain > 10.244.2.202.43230: 62042 NXDomain*- 0/1/0 (139)
10:09:11.924005 IP 10.244.2.202.43230 > 10.96.0.10.domain: 54769+ AAAA? google.com.svc.cluster.local. (46)
10:09:11.924494 IP 10.96.0.10.domain > 10.244.2.202.43230: 54769 NXDomain*- 0/1/0 (139)
10:09:11.924704 IP 10.244.2.202.36252 > 10.96.0.10.domain: 20727+ A? google.com.cluster.local. (42)
10:09:11.925154 IP 10.96.0.10.domain > 10.244.2.202.36252: 20727 NXDomain*- 0/1/0 (135)
10:09:11.925316 IP 10.244.2.202.36252 > 10.96.0.10.domain: 13066+ AAAA? google.com.cluster.local. (42)
10:09:11.925758 IP 10.96.0.10.domain > 10.244.2.202.36252: 13066 NXDomain*- 0/1/0 (135)
10:09:11.925929 IP 10.244.2.202.35582 > 10.96.0.10.domain: 38821+ A? google.com.lan. (32)
10:09:11.927244 IP 10.244.2.202.35582 > 10.96.0.10.domain: 4430+ AAAA? google.com.lan. (32)
10:09:11.927416 IP 10.96.0.10.domain > 10.244.2.202.35582: 38821 NXDomain 0/0/0 (32)
10:09:11.928600 IP 10.96.0.10.domain > 10.244.2.202.35582: 4430 NXDomain 0/0/0 (32)
10:09:11.928839 IP 10.244.2.202.45290 > 10.96.0.10.domain: 45577+ A? google.com. (28)
10:09:11.929129 IP 10.244.2.202.45290 > 10.96.0.10.domain: 37586+ AAAA? google.com. (28)
10:09:11.929303 IP 10.96.0.10.domain > 10.244.2.202.45290: 45577 1/0/0 A 172.217.160.78 (54)
10:09:11.929541 IP 10.96.0.10.domain > 10.244.2.202.45290: 37586 1/0/0 AAAA 2404:6800:4008:801::200e (66)

It can be seen that the following domains were queried in turn before the final (penultimate lines 3 and 4) correct resolution, and both IPv4 and IPv6 were queried.

  • google.com.default.svc.cluster.local.
  • google.com.svc.cluster.local.
  • google.com.cluster.local.
  • google.com.lan.

But all 8 of these queries failed because no such domain exists.

kubernetes container domain name resolution

To explain the above phenomenon, we need to start with kubernetes container domain name resolution.

The domain name resolution of containers running on kubernetes is based on the /etc/resolv.conf file, just like Linux in general. Here are the contents of this file in the container.

1
2
3
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local lan
options ndots:5

The nameserver is the svc IP of kube-dns in the kubernetes cluster, and the nameserver of all containers in the cluster is set to kube-dns.

1
2
3
kubectl get svc -n kube-system
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
kube-dns         ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   236d

So what do search and ndots do?

search and ndots

Before explaining search and ndots, there is a concept that needs to be understood: FQDN (Fully qualified domain name).

FQDN is the full domain name, generally speaking, the domain name ends up with . ends in FQDN, for example, google.com. is FQDN, but google.com is not.

For FQDN, the OS will query the DNS server directly. what about non-FQDN? Here is where search and ndots come into play.

ndots indicates the number of ., if the number of . in the domain name is not less than ndots, then the domain name is an FQDN and the operating system will query directly; if the number of . in the domain name is less than ndots, the operating system will query in the search search domain.

For example, in the above example, ndots is 5, the domain name google.com does not end with ., and the number of . is less than 5, so the OS will search in default.svc.cluster.local svc.cluster.local cluster.local lan, where the first 3 search domains are injected by kubernetes, and the last lan is the OS default search domain.

The default value of ndots is 1, which means that as long as the domain has a ., the OS will assume it is an absolute domain and query it directly.

The upper limit of ndots is 15.

1
2
ndots:n
    Sets a threshold for the number of dots which must appear in a name given to res_query(3) (see resolver(3)) before an initial absolute query will be made.  The default  for  n  is  1, meaning  that  if  there  are  any  dots  in a name, the name will be tried first as an absolute name before any search list elements are appended to it.  The value for this option is silently capped to 15.

Why kubernetes uses search domains

Why? Let’s look at the code first.

1
2
3
4
var (
    // The default dns opt strings.
    defaultDNSOptions = []string{"ndots:5"}
)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
func (c *Configurer) generateSearchesForDNSClusterFirst(hostSearch []string, pod *v1.Pod) []string {
    if c.ClusterDomain == "" {
        return hostSearch
    }

    nsSvcDomain := fmt.Sprintf("%s.svc.%s", pod.Namespace, c.ClusterDomain)
    svcDomain := fmt.Sprintf("svc.%s", c.ClusterDomain)
    clusterSearch := []string{nsSvcDomain, svcDomain, c.ClusterDomain}

    return omitDuplicates(append(clusterSearch, hostSearch...))
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
func (c *Configurer) GetPodDNS(pod *v1.Pod) (*runtimeapi.DNSConfig, error) {
    ...
    case podDNSCluster:
        if len(c.clusterDNS) != 0 {
            dnsConfig.Servers = []string{}
            for _, ip := range c.clusterDNS {
                dnsConfig.Servers = append(dnsConfig.Servers, ip.String())
            }
            dnsConfig.Searches = c.generateSearchesForDNSClusterFirst(dnsConfig.Searches, pod)
            dnsConfig.Options = defaultDNSOptions
            break
        }
    ...
    if utilfeature.DefaultFeatureGate.Enabled(features.CustomPodDNS) && pod.Spec.DNSConfig != nil {
        dnsConfig = appendDNSConfig(dnsConfig, pod.Spec.DNSConfig)
    }
}

kubernetes search domains

As seen in the function generateSearchesForDNSClusterFirst, there are three search domains: nsSvcDomain, svcDomain, and clusterDomain.

The reason why kubernetes sets up search domains is to facilitate user access to services.

For example, Pod a under default namespace, if you access service b under the same namespace, you can access it directly by using b. This is done by using the nsSvcDomain search domain default.svc.cluster.local.

Similarly, for services under different namespace, you can use ${service name}. ${namespace name} to access it, which is done by svcDomain search domain.

The clusterDomain is designed to facilitate access to non-kubernetes domains in the same domain, for example, if you set the domain of kubernetes to ieevee.com, then for the s.ieevee.com domain, you can access it directly using s, provided of course that the current (Yes. The search domain has priority)

ndots default value

The default value of ndots is written dead, 5.

Why is it 5?

This is explained by thockin in issue 33554 and is summarized as follows.

  1. kubernetes needs to support fast access to services under the same namespace, e.g. name, so ndots>=1, corresponding to the search domain $namespace.svc.$zone
  2. kubernetes needs to support fast access to services across namespaces, e.g. kubernetes.default, so ndots>=2, which corresponds to the search domain svc.$zone
  3. kubernetes needs to support fast access to non-service names under the same namespace and across namespaces, e.g. name.namespace.svc, so ndots>=3, corresponding to the search domain $zone
  4. kubernetes needs to support access to each pod in a statefulset, e.g. mysql-0.mysql.default.svc, so ndots>=4
  5. kubernetes needs to support SRV records (_$port._$proto.$service.$namespace.svc.$zone), so ndots>=5