1. Background of requirements

kubernetes

As shown above, the business side needs to isolate the namespae’s service by disabling load access to the bar space and allowing users to access the service from the Load Balancer (LB) via NodePort. A network policy can be easily written.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: foo
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Ingress
  ingress:
  - from:
    - ipBlock:
        cidr: 10.2.3.4/32
    - namespaceSelector:
        matchExpressions:
        - key: region
          operator: NotIn
          values:
          - bar

However, traffic accessing from LB is completely banned, which is not expected. The answer from searching the technical community may be that Kubernetes NetworkPolicy is mainly for in-cluster access policies, and external traffic cannot hit the policy after the IP changes after SNAT.

The configuration will vary from network plugin to network plugin, using different patterns. This article only provides an idea to configure NodePort’s traffic access policy using the common Calico IPIP mode as an example.

2. Pre-requisite knowledge points

2.1 NetworkPolicy in Kubernetes

NetworkPolicy is a network isolation object in Kubernetes that describes network isolation policies and depends on network plugins for implementation. Currently, network plugins such as Calico, Cilium, and Weave Net support network isolation functionality.

2.2 Several modes of operation for Calico

  • BGP mode

In BGP mode, BGP clients in a cluster are interconnected two by two to synchronize routing information.

  • Route Reflector mode

In BGP mode, the number of client connections reaches N * (N - 1), with N denoting the number of nodes. This approach limits the size of the nodes, and the community recommends no more than 100 nodes.

In Route Reflector mode, instead of synchronizing routing information two-by-two, BGP clients synchronize routing information to a number of specified Route Reflectors. All BGP clients only need to establish connections to the Route Reflector, and the number of connections is linearly related to the number of nodes.

  • IPIP Mode

Unlike the BGP mode, the IPIP mode establishes tunnels between nodes for network connectivity through tunnels. The following diagram depicts the traffic between Pods in IPIP mode.

Several modes of operation for Calico

3. why the network policy does not work

why the network policy does not work

In Cluster mode, if node-2:nodeport is accessed, the traffic will be forwarded to node-1, the node with the service Pod.

why the network policy does not work

In Local mode, if the accessed node-2:nodeport, the traffic will not be forwarded and cannot respond to the request.

Usually we default to Cluster mode, which performs SNAT, or source address modification, when forwarding traffic. This causes the access request to not hit the network policy, mistakenly thinking that the network policy is not in effect.

Here we try two solutions.

  1. add the source address after SNAT to the access whitelist
  2. Use Local mode. Since LB has the ability to probe live, it can forward the traffic to the node with the service pod, thus preserving the source address.

4. NetworkPolicy configuration under NodePort

4.1 Test environment

  • Kubernetes version

v1.19.8

  • kube-proxy forwarding mode

IPVS

  • Node information
1
2
3
4
5
6
kubectl get node -o wide

NAME    STATUS   ROLES           AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION    CONTAINER-RUNTIME
node1   Ready    master,worker   34d   v1.19.8   10.102.123.117   <none>        CentOS Linux 7 (Core)   3.10.0-1127.el7.x86_64   docker://20.10.6
node2   Ready    worker          34d   v1.19.8   10.102.123.104   <none>        CentOS Linux 7 (Core)   3.10.0-1127.el7.x86_64   docker://20.10.6
node3   Ready    worker          34d   v1.19.8   10.102.123.143   <none>        CentOS Linux 7 (Core)   3.10.0-1127.el7.x86_64   docker://20.10.6
  • Load of the test
1
2
3
4
kubectl -n tekton-pipelines get pod -o wide

NAME                                          READY   STATUS    RESTARTS   AGE   IP         NODE    NOMINATED NODE   READINESS GATES
tekton-dashboard-75c65d785b-xbgk6             1/1     Running   0          14h   10.233.96.32    node2   <none>           <none>

The load runs on the node2 node

  • The tested service
1
2
3
4
kubectl -n tekton-pipelines get svc

NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                              AGE
tekton-dashboard              NodePort    10.233.5.155    <none>        9097:31602/TCP                       10m

4.2 How NodePort traffic is forwarded to Pods

There are two main cases to consider here.

  1. accessing node node1 where no Pod load exists

    • Service forwarding rules
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
    ipvsadm  -L
    
    TCP  node1:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    
    TCP  node1:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    
    TCP  node1.cluster.local:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    
    TCP  node1:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    
    TCP  localhost:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    

    You can see that traffic accessing node1:31602 is forwarded to 10.233.96.32:9097, which is the IP address and port of the Service Pod.

    • IP routing rules

    Next, look at the routing rules. Access to the 10.233.96.0/24 network segment is routed to tunl0, which is tunneled to node2 and then to the service.

    1
    2
    3
    4
    5
    6
    
    route
    
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    10.233.92.0     node3.cluster.l 255.255.255.0   UG    0      0        0 tunl0
    10.233.96.0     node2.cluster.l 255.255.255.0   UG    0      0        0 tunl0
    
  2. Access node node2 with Pod load

    • Service forwarding rules
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
    ipvsadm  -L
    
    TCP  node2:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    
    TCP  node2:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    
    TCP  node2.cluster.local:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          1
    
    TCP  node2:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    
    TCP  localhost:31602 rr
    -> 10.233.96.32:9097            Masq    1      0          0
    

    As with node1, access to the NodePort service on node2 is forwarded to the IP address and port of the service pod.

    • Routing Forwarding Rules

    But the routing rules are different. Packets with a destination address of 10.233.96.32 are sent to cali73daeaf4b12. And cali73daeaf4b12 and the NIC in the Pod form a set of veth pair and the traffic will be sent directly to the service Pod.

    1
    2
    3
    4
    5
    6
    7
    
    route
    
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    10.233.90.0     node1.cluster.l 255.255.255.0   UG    0      0        0 tunl0
    10.233.92.0     node3.cluster.l 255.255.255.0   UG    0      0        0 tunl0
    10.233.96.32    0.0.0.0         255.255.255.255 UH    0      0        0 cali73daeaf4b12
    

    From the return of the above command, we know that if we access a node without Pod load, the traffic will be forwarded through tunl0; if we access a node with Pod load, the traffic will be routed directly to the Pod without tunl0.

4.3 Option 1, add tunl0 to the network policy whitelist

  • Check the tunl0 information of each node

node1

1
2
3
4
ifconfig

tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1440
        inet 10.233.90.0  netmask 255.255.255.255

node2

1
2
3
4
ifconfig

tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1440
        inet 10.233.96.0  netmask 255.255.255.255

node3

1
2
3
4
ifconfig

tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1440
        inet 10.233.92.0  netmask 255.255.255.255
  • Network policy configuration
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: foo
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Ingress
  ingress:
  - from:
    - ipBlock:
        cidr: 10.2.3.4/32
    - ipBlock:
        cidr: 10.233.90.0/32
    - ipBlock:
        cidr: 10.233.96.0/32
    - ipBlock:
        cidr: 10.233.92.0/32
    - namespaceSelector:
        matchExpressions:
        - key: region
          operator: NotIn
          values:
          - bar
  • Test verification

Does not meet expectations. All traffic passing through tunl0 is allowed. bar The namespace load can access the service by accessing node1:31602, node3:31602, tekton-dashboard.tekton-pipelines.svc:9097 (not the load on node2) and no restrictions can be placed on the traffic.

4.4 Option 2, using Local mode

  • Modify svc’s externalTrafficPolicy to Local mode
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
kubectl -n tekton-pipelines get svc tekton-dashboard -o yaml

apiVersion: v1
kind: Service
metadata:
  name: tekton-dashboard
  namespace: tekton-pipelines
spec:
  clusterIP: 10.233.5.155
  externalTrafficPolicy: Local
...
  • Reject all entrance traffic
1
2
3
4
5
6
7
8
9
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: test-network-policy-deny-all
  namespace: foo
spec:
  podSelector:
    matchLabels: {}
  ingress: []
  • Add access whitelist
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: foo
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Ingress
  ingress:
  - from:
    - ipBlock:
        cidr: 10.2.3.4/32
  • Test verification

Meets expectations. Using the network policy above, the business requirements are met by blocking access to the bar namespace and allowing external access to the NodePort via LB forwarding.

5. Summary

Networking is a relatively difficult part of Kuberntes to master, yet it is one of the aspects that has a large and far-reaching impact on the business. Therefore, it is necessary and worthwhile to spend a little more time on networking.

This paper focuses on the business requirements and further elaborates on Calico’s network mode, solving the problem that the source IP changes due to SNAT and eventually the NetworkPolicy does not meet expectations.

In Calico’s IPIP mode, the access policy for NodePort needs to use externalTrafficPolicy: Local traffic forwarding mode. Combine this with the network policy best practice of adding a whitelist policy after first disabling all traffic.