The demos and examples in this article are validated in the v1.18.17 cluster.

Pod Security Policies

Pod Security Policies (hereafter referred to as psp or pod security policies) is a cluster-level global resource that provides fine-grained authorization control over pod creation and updates. Specifically, a psp object defines a set of security conditions that must be met by a pod’s spec field, along with the default values of the applicable fields, before its creation or update request will be accepted by the apiserver.

The specific pod fields and security conditions can be found in the documentation what-is-a-pod-security-policy.

Enabling Pod Security Policies

Kubernetes does not enable pod security policies by default. The steps to enable pod security policies in a cluster are roughly divided into three steps.

  1. grant users access to the security policy resources, usually to service accounts across the namespace.
  2. create the specified security policy resource in the cluster.
  3. enable the admission-controller plug-in for the apiserver.

Note that the order of steps 1 and 2 is not important, as they have no practical impact.

However, step 3 is recommended to be executed in the last step. Otherwise, once the admission-controller plugin is enabled, if there is no pod security policy available in the cluster or if the security policy resources are not pre-authorized, all pod creation will be denied, including system management components such as apiserver under the kube-system namespace (but since it is a kubelet-managed static pods will actually still be running).

RBAC Authentication

  1. Create a ClusterRole with access to all security policy resources.

    1
    2
    3
    4
    5
    6
    7
    8
    
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    name: all-psp
    rules:
    - apiGroups: ['policy']
    resources: ['podsecuritypolicies']
    verbs:     ['use']
    
  2. Bind the created role to all service accounts under the specified namespace (which can also be authorized to the specified sa or user) via ClusterRoleBinding.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    name: cluster-psp-bind
    roleRef:
    kind: ClusterRole
    name: all-psp
    apiGroup: rbac.authorization.k8s.io
    subjects:
    # 授权给指定命名空间下的所有 service account(推荐做法):
    - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:nodes
    namespace: kube-system
    - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:serviceaccounts:kube-system
    - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:serviceaccounts:security-test
    # 也可授权给指定的 service account 或者用户(不推荐):
    - kind: ServiceAccount
    name: <authorized service account name>
    namespace: <authorized pod namespace>
    - kind: User
    apiGroup: rbac.authorization.k8s.io
    name: <authorized user name>
    # 授权给所有的 service accounts:
    - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:serviceaccounts
    # 授权给所有已认证的用户:
    - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:authenticated
    

Creating a Security Policy Resource

  1. Create a PodSecurityPolicy resource in the cluster. Loose-privilege version.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    
    apiVersion: policy/v1beta1
    kind: PodSecurityPolicy
    metadata:
    name: privileged
    annotations:
        seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
    spec:
    privileged: true
    allowPrivilegeEscalation: true
    allowedCapabilities:
    - '*'
    volumes:
    - '*'
    hostNetwork: true
    hostPorts:
    - min: 0
        max: 65535
    hostIPC: true
    hostPID: true
    runAsUser:
        rule: 'RunAsAny'
    seLinux:
        rule: 'RunAsAny'
    supplementalGroups:
        rule: 'RunAsAny'
    fsGroup:
        rule: 'RunAsAny'
    
  2. Strict Permissions version.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    
    apiVersion: policy/v1beta1
    kind: PodSecurityPolicy
    metadata:
    name: restricted
    annotations:
        seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
        apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
        apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
    spec:
    privileged: false
    # Required to prevent escalations to root.
    allowPrivilegeEscalation: false
    requiredDropCapabilities:
        - ALL
    # Allow core volume types.
    volumes:
        - 'configMap'
        - 'emptyDir'
        - 'projected'
        - 'secret'
        - 'downwardAPI'
        # Assume that ephemeral CSI drivers & persistentVolumes set up by the cluster admin are safe to use.
        - 'csi'
        - 'persistentVolumeClaim'
        - 'ephemeral'
    hostNetwork: false
    hostIPC: false
    hostPID: false
    runAsUser:
        # Require the container to run without root privileges.
        rule: 'MustRunAsNonRoot'
    seLinux:
        # This policy assumes the nodes are using AppArmor rather than SELinux.
        rule: 'RunAsAny'
    supplementalGroups:
        rule: 'MustRunAs'
        ranges:
        # Forbid adding the root group.
        - min: 1
            max: 65535
    fsGroup:
        rule: 'MustRunAs'
        ranges:
        # Forbid adding the root group.
        - min: 1
            max: 65535
    readOnlyRootFilesystem: false
    

Enabling the admission controller plug-in

There are two ways to enable the admission controller psp plugin: 1.

  1. by modifying the static manifest file of the apiserver in an existing cluster. Add the startup parameter enable-admission-plugins=PodSecurityPolicy to the apiserver. kubelet will automatically detect the change and restart the apiserver. the following example replaces the original parameter using sed.

    1
    
    $ sed -i 's/enable-admission-plugins=NodeRestriction/enable-admission-plugins=NodeRestriction,PodSecurityPolicy/' /etc/kubernetes/manifests/kube-apiserver.yaml
    
  2. or add additional parameters to the kubeadm configuration file when initializing the cluster.

    1
    2
    3
    4
    5
    
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: ClusterConfiguration
    apiServer:
    extraArgs:
        enable-admission-plugins: "PodSecurityPolicy"
    

Verifying psp security restrictions

We test this in the security-test namespace authorized above to verify the psp restrictions on the pod.

First ensure that a strict version of the psp resource is applied to the cluster, then try to create a pod that requires hostNetwork via deployment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hostnetwork
spec:
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      hostNetwork: true
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: nginx-privileged

Create and view the results.

1
2
3
4
5
6
7
$ kubectl create -f hostnetwork-pod.yaml -n security-test
deployment.apps/nginx-hostnetwork created
$ kubectl get deploy -n security-test nginx-hostnetwork
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
nginx-hostnetwork   0/1     0            0           17s
$ kubectl -n security-test get event | grep "pod security policy"
103s        Warning   FailedCreate             deployment/nginx-hostnetwork             

Limitations

If a pod violates the security policy, the solution is either to adjust the specification of the pod or to modify the pod security policy resource. psp resources are globally valid and cannot set different security policy levels for different namespaces, which is an obvious limitation.

In addition, the authorization mechanism for psps is complicated. If no authorization or security policy is created, all pods are rejected, which makes it difficult to turn on the feature by default in the cluster.

Starting with Kubernetes v1.21, Pod Security Policy will be deprecated and will be removed in v1.25. Kubernetes introduces Pod Security Admission as its replacement, which we will explain in detail below.

Pod Security Admission

Why replace psp

KEP-2579 details three main reasons for using Pod Security Admission instead of Pod Security Policy.

  1. a model that binds the policy to a user or service account weakens security.
  2. functionality cannot be switched smoothly and cannot be shut down without a security policy.
  3. inconsistent and inflexible APIs.

The new Pod Security Admission mechanism is much improved in terms of ease of use and flexibility, with the following four significant differences from a usage perspective:

  1. can be turned on by default in the cluster, as long as no constraints are added it will not trigger the verification of the pod;
  2. it works only at the namespace level, and different security restrictions can be set for different namespaces by adding tags.
  3. exemption rules can be set for specific users, namespaces or runtimes.
  4. three levels of security are preset according to practice, without the need for the user to set each security condition individually.

How it works

Pod Security Admission divides the security conditions of the original Pod Security Policy into three pre-defined levels of security.

  • privileged : Unrestricted, providing all available privileges to the pod.
  • baseline : A minimal restriction policy that prevents known privilege escalation.
  • restricted : A strict restriction policy that follows current best practices for pod hardening.

Each of the three levels, in ascending order from lax to strict, contains different limits of security conditions for different pod working scenarios. It is also possible to set the security level to a fixed Kubernetes version, so that even if the cluster is upgraded to a new version and the security level definition of the new version changes, the pods can still be bound by the security conditions of the old version.

When pods conflict with security levels, we can choose different ways to handle them in three modes.

  • enforce : Allow only pods that meet the security level requirements and reject pods that conflict with the security level.
  • audit : Only log security level conflicts in the cluster event, no pods will be rejected.
  • warn : Returns a warning message to the user when there is a conflict with the security level, but does not reject the pod.

The audit and warn modes are independent; if you need both, you must set up both modes separately.

Applying security policies no longer requires the creation of separate cluster resources, only control tags for the namespace.

1
2
pod-security.kubernetes.io/<mode>: <level>
pod-security.kubernetes.io/<mode>-version: <version>

A more complete example is provided below.

Enabling psa in older versions

While Pod Security Admission is a feature introduced in Kubernetes v1.22, older versions can be enabled by installing the PodSecurity admission webhook. The steps are as follows.

1
2
3
4
$ git clone https://github.com/kubernetes/pod-security-admission.git
$ cd pod-security-admission/webhook
$ make certs
$ kubectl apply -k .

The above steps from the official documentation in v1.18.17 will have two compatibility issues, the specific problems and solutions are as follows.

  1. kubectl’s built-in kustomize version does not support the “replacements” field.

    1
    2
    
    $ kubectl apply -k .
    error: json: unknown field "replacements"
    

    Solution: Install the latest version of kusomize and then execute the following command in the same directory.

    1
    
    $ kustomize build . | kubectl apply -f -
    
  2. The Deployment.spec.template.spec.containers[0].securityContext field defined in the manifest/50-deployment.yaml file was introduced in v1.19, so v1.18 needs to change this field to See Seccomp for details.

    1
    
    error: error validating "STDIN": error validating data: ValidationError(Deployment.spec.template.spec.containers[0].securityContext): unknown field "seccompProfile" in io.k8s.api.core.v1.SecurityContext; if you choose to ignore these errors, turn validation off with --validate=false
    

Verify psa security restrictions

First create a new namespace psa-test for testing, and define it to force the application of baseline security levels, with warnings and auditing for restricted levels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: v1
kind: Namespace
metadata:
  name: psa-test
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/enforce-version: v1.18

    # We are setting these to our _desired_ `enforce` level.
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.18
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.18

Then create the deployment in that namespace as used in the example above.

1
2
3
4
5
6
7
$ kubectl create -f hostnetwork-pod.yaml -n psa-test
deployment.apps/nginx-hostnetwork created
$ kubectl get deploy -n psa-test nginx-hostnetwork
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
nginx-hostnetwork   0/1     0            0           17s
$ kubectl -n psa-test get event | grep PodSecurity
104s        Warning   FailedCreate        replicaset/nginx-hostnetwork-644cdd6598   Error creating: admission webhook "pod-security-webhook.kubernetes.io" denied the request: pods "nginx-hostnetwork-644cdd6598-7rb5m" is forbidden: violates PodSecurity "baseline:v1.23": host namespaces (hostNetwork=true)