Cause

There is a CNI component running as a DaemonSet on all nodes. This CNI Pod converts its Service Account Token into a kubeconfig and stores it in the host’s directory. When the kubelet calls the cni plugin, the cni plugin uses this kubeconfig to get some information about the cluster pod.

There is a problem on k8s 1.24, when CNI Pod restarts, using the generated kubeconfig returns Unauthorized error, i.e. the token is not authenticated by APIServer anymore.

Reason

On k8s 1.24, the token generation logic for ServiceAccount (abbreviated as SA) has changed, as it no longer automatically generates a token for SA and saves it in the secret, and the secret is no longer mounted when the token is used in a Pod.

When a Pod uses an SA, the default behavior is as follows.

  1. After the Pod is created, during the admission phase, there is a serviceaccount admission that mounts the token for the Pod, again under /var/run/secrets/kubernetes.io/serviceaccount. But instead of the volume field being passed through secret, it is passed through projected.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    projected:
    defaultMode: 420
    sources:
        # source type is serviceAccountToken
    - serviceAccountToken:
        expirationSeconds: 3607
        path: token
    - configMap:
        items:
        - key: ca.crt
            path: ca.crt
        name: kube-root-ca.crt
    - downwardAPI:
        items:
        - fieldRef:
            apiVersion: v1
            fieldPath: metadata.namespace
            path: namespace
    
  2. Once the Pod is dispatched to the Node, the projected volume mounter in the kubelet mounts the corresponding files for the Pod based on the volume type in the volumesMount. When a projected source of type ServiceAccountToken is found, the apiserver’s TokenRequest interface is called to request a temporary token for the current Pod. The token is only valid for 3607s. kubelet automatically refreshes the token to ensure that it does not expire.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    
    case source.ServiceAccountToken != nil:
                tp := source.ServiceAccountToken
    
                // When FsGroup is set, we depend on SetVolumeOwnership to
                // change from 0600 to 0640.
                mode := *s.source.DefaultMode
                if mounterArgs.FsUser != nil || mounterArgs.FsGroup != nil {
                    mode = 0600
                }
    
                var auds []string
                if len(tp.Audience) != 0 {
                    auds = []string{tp.Audience}
                }
                tr, err := s.plugin.getServiceAccountToken(s.pod.Namespace, s.pod.Spec.ServiceAccountName, &authenticationv1.TokenRequest{
                    Spec: authenticationv1.TokenRequestSpec{
                        Audiences:         auds,
                        ExpirationSeconds: tp.ExpirationSeconds,
                        BoundObjectRef: &authenticationv1.BoundObjectReference{
                            APIVersion: "v1",
                            Kind:       "Pod",
                            Name:       s.pod.Name,
                            UID:        s.pod.UID,
                        },
                    },
                })
                if err != nil {
                    errlist = append(errlist, err)
                    continue
                }
                payload[tp.Path] = volumeutil.FileProjection{
                    Data:   []byte(tr.Status.Token),
                    Mode:   mode,
                    FsUser: mounterArgs.FsUser,
                }
    

The benefit of this is that service accounts no longer have a permanent token by default, but instead each Pod has a temporary token that is valid for 3607s by default, which is automatically refreshed by the kubelet and expires when the Pod is deleted. This is a significant security improvement.

Solution

In order to be consistent with the behavior of the previous component, it is necessary to ensure that the token is persistent. The easiest solution is to manually create the token secret for the service account. e.g.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Secret
# indicates that this secret type
type: kubernetes.io/service-account-token
metadata:
  name: mycontroller
  namespace: kube-system
  annotations:
    # service account name
    kubernetes.io/service-account.name: "mycontroller"

When the k8s tokens-controller watches the secret, it finds that the ca, namespace, and token fields are empty, so it automatically populates them for the secret. This gives us a permanent token that we can use to generate kubeconfig.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
func (e *TokensController) secretUpdateNeeded(secret *v1.Secret) (bool, bool, bool) {
    caData := secret.Data[v1.ServiceAccountRootCAKey]
    needsCA := len(e.rootCA) > 0 && !bytes.Equal(caData, e.rootCA)

    needsNamespace := len(secret.Data[v1.ServiceAccountNamespaceKey]) == 0

    tokenData := secret.Data[v1.ServiceAccountTokenKey]
    needsToken := len(tokenData) == 0

    return needsCA, needsNamespace, needsToken
}

How Token does authentication

The service account token behaves differently in different versions, so how does the token itself do the authentication?

A token is a JWT-compliant string.

For a persistent token, it holds information about the service account.

1
2
3
4
5
6
7
8
{
  "iss": "kubernetes/serviceaccount",
  "kubernetes.io/serviceaccount/namespace": "kube-system",
  "kubernetes.io/serviceaccount/secret.name": "mycontroller",
  "kubernetes.io/serviceaccount/service-account.name": "mycontroller",
  "kubernetes.io/serviceaccount/service-account.uid": "2f0ab840-064c-4168-b9b2-932c361e13d6",
  "sub": "system:serviceaccount:kube-system:mycontroller"
}

After the apiserver gets the token, it checks the integrity of the content according to the JWT specification. After the verification is passed, the service account in the token is used for authentication.

For the temporary (pod) token, the content is a little different.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "aud": [
    "https://kubernetes.default.svc.cluster.local"
  ],
  "exp": 1705344168,
  "iat": 1673808168,
  "iss": "https://kubernetes.default.svc.cluster.local",
  "kubernetes.io": {
    "namespace": "kube-system",
    "pod": {
      "name": "mycontroller-lr99n",
      "uid": "f8a3c6c7-c41c-4a33-9329-f40d208a03e6"
    },
    "serviceaccount": {
      "name": "mycontroller",
      "uid": "2f0ab840-064c-4168-b9b2-932c361e13d6"
    },
    "warnafter": 1673811775
  },
  "nbf": 1673808168,
  "sub": "system:serviceaccount:kube-system:mycontroller"
}

You can see that the token contains not only the service account information, but also the pod information. The validity of the token is determined by the pod’s lifetime and nbf, exp. nbf stands for Not valid before and exp stands for Expiration time, both of which are stored using unix time. And the token is automatically invalidated after the pod is deleted. Also, the authentication is still done using the service account.