1. Overview

Namespace is an important concept in kubernetes, an abstraction of a set of resources and objects, often used to isolate different users. namespace has many resources under it, such as our common deployment, pods, service, ingress, configmap, and so on.

Of course, the focus of this article is on what happens when namespace is deleted. A typical scenario is when executing kubectl delete ns test in the terminal, we will observe that after executing the command, the test namespace will immediately enter the terminating state, and will only be actually deleted after a few seconds. This is even though there are no resources in the test namespace.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
NAME              STATUS   AGE
default           Active   2d2h
docker            Active   2d2h
kube-node-lease   Active   2d2h
kube-public       Active   2d2h
kube-system       Active   2d2h
test              Active   4s
test              Terminating   18s
test              Terminating   23s
test              Terminating   23s

Therefore, we will explore the following points in the following.

  • How api-server handles namespace deletion requests
  • How to handle the resources in the namespace when deleting it

2. How api server handles namespace deletion requests

Unlike other resources, namespace needs to be emptied when it is deleted. When namespace is terminating, it means that the resources under it have not been confirmed to be deleted. Therefore, when the api-server receives a request to delete a namespace, it does not immediately delete it from etcd, but first checks whether the metadata.DeletionTimestamp is empty. If it is empty, metadata.DeletionTimestamp is set to the current time, and then status.Phase is set to terminating. If metadata.DeletionTimestamp is not empty, then we have to determine if spec. If it is empty, then the namespace is actually deleted. This way, the namespace is not deleted if spec. So when is the finalizer added? How does it work?

3. finalizer mechanism

The finalizer of namespace is actually added at the time of creation. The processing logic can be seen in the following code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// PrepareForCreate clears fields that are not allowed to be set by end users on creation.
func (namespaceStrategy) PrepareForCreate(ctx context.Context, obj runtime.Object) {
    // on create, status is active
    namespace := obj.(*api.Namespace)
    namespace.Status = api.NamespaceStatus{
        Phase: api.NamespaceActive,
    }
    // on create, we require the kubernetes value
    // we cannot use this in defaults conversion because we let it get removed over life of object
    hasKubeFinalizer := false
    for i := range namespace.Spec.Finalizers {
        if namespace.Spec.Finalizers[i] == api.FinalizerKubernetes {
            hasKubeFinalizer = true
            break
        }
    }
    if !hasKubeFinalizer {
        if len(namespace.Spec.Finalizers) == 0 {
            namespace.Spec.Finalizers = []api.FinalizerName{api.FinalizerKubernetes}
        } else {
            namespace.Spec.Finalizers = append(namespace.Spec.Finalizers, api.FinalizerKubernetes)
        }
    }
}

Then the namespace changes to the terminating state when it is deleted and the namespace controller comes into play. The namespace controller is part of the controller manager and listens for namespace add and update events.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
    // configure the namespace informer event handlers
    namespaceInformer.Informer().AddEventHandlerWithResyncPeriod(
        cache.ResourceEventHandlerFuncs{
            AddFunc: func(obj interface{}) {
                namespace := obj.(*v1.Namespace)
                namespaceController.enqueueNamespace(namespace)
            },
            UpdateFunc: func(oldObj, newObj interface{}) {
                namespace := newObj.(*v1.Namespace)
                namespaceController.enqueueNamespace(namespace)
            },
        },
        resyncPeriod,
    )

And a workqueue is used to store the change events for each namespace. Then it all triggers nm.namespacedResourcesDeleter.Delete(namespace.Name). Of course, if the namespace does not exist or if namespace.DeletionTimestamp is empty, it will exit.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    namespace, err := d.nsClient.Get(context.TODO(), nsName, metav1.GetOptions{})
    if err != nil {
        if errors.IsNotFound(err) {
            return nil
        }
        return err
    }
    if namespace.DeletionTimestamp == nil {
        return nil
    }

Otherwise the namespace’s phase would be set to terminating first anyway.

This means that if a namespace is already terminating, you can’t change the state of the namespace by just modifying the phase. I’ve had a case before where I manually changed the phase to active when the namespace was terminating, but the namespace immediately became terminating, which is probably why.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// updateNamespaceStatusFunc will verify that the status of the namespace is correct
func (d *namespacedResourcesDeleter) updateNamespaceStatusFunc(namespace *v1.Namespace) (*v1.Namespace, error) {
    if namespace.DeletionTimestamp.IsZero() || namespace.Status.Phase == v1.NamespaceTerminating {
        return namespace, nil
    }
    newNamespace := v1.Namespace{}
    newNamespace.ObjectMeta = namespace.ObjectMeta
    newNamespace.Status = *namespace.Status.DeepCopy()
    newNamespace.Status.Phase = v1.NamespaceTerminating
    return d.nsClient.UpdateStatus(context.TODO(), &newNamespace, metav1.UpdateOptions{})
}

After that, an attempt is made to clear all the contents of the namespace.

1
2
3
4
5
6
7
8
    // there may still be content for us to remove
    estimate, err := d.deleteAllContent(namespace)
    if err != nil {
        return err
    }
    if estimate > 0 {
        return &ResourcesRemainingError{estimate}
    }

4. The working mechanism of DiscoveryInterface

Now we face a problem is how to clean up all the resources under the namespace? Normally, if we want to delete a pod, we can call the PodInterface interface provided by client-go to delete it, which is actually a wrapper for the RESTful HTTP DELETE action. But now, since we don’t know what resources are under the namespace, there is no way to call the delete interface directly.

So client-go also provides a DiscoveryInterface, as the name implies, DicoveryInterface can be used to discover the API groups, versions, resources in the cluster. After getting a list of all the interface resources in the cluster, we can query and delete these resources.

The DicoveryInterface interface is as follows.

1
2
3
4
5
6
7
8
9
// DiscoveryInterface holds the methods that discover server-supported API groups,
// versions and resources.
type DiscoveryInterface interface {
    RESTClient() restclient.Interface
    ServerGroupsInterface
    ServerResourcesInterface
    ServerVersionInterface
    OpenAPISchemaInterface
}

One of the ServerGroupInterface provides the ability to get all the interface groups in the cluster with the following function signatures.

1
2
3
    // ServerGroups returns the supported groups, with information like supported versions and the
    // preferred version.
    ServerGroups() (*metav1.APIGroupList, error)

ServerVersionInterface can be used to get the version information of the service, the specific function signature is as follows.

1
2
    // ServerVersion retrieves and parses the server's version (git version).
    ServerVersion() (*version.Info, error)

Then we need to focus on the ServerResourcesInterface interface.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// ServerResourcesInterface has methods for obtaining supported resources on the API server
type ServerResourcesInterface interface {
    // ServerResourcesForGroupVersion returns the supported resources for a group and version.
    ServerResourcesForGroupVersion(groupVersion string) (*metav1.APIResourceList, error)
    // ServerResources returns the supported resources for all groups and versions.
    //
    // The returned resource list might be non-nil with partial results even in the case of
    // non-nil error.
    //
    // Deprecated: use ServerGroupsAndResources instead.
    ServerResources() ([]*metav1.APIResourceList, error)
    // ServerResources returns the supported groups and resources for all groups and versions.
    //
    // The returned group and resource lists might be non-nil with partial results even in the
    // case of non-nil error.
    ServerGroupsAndResources() ([]*metav1.APIGroup, []*metav1.APIResourceList, error)
    // ServerPreferredResources returns the supported resources with the version preferred by the
    // server.
    //
    // The returned group and resource lists might be non-nil with partial results even in the
    // case of non-nil error.
    ServerPreferredResources() ([]*metav1.APIResourceList, error)
    // ServerPreferredNamespacedResources returns the supported namespaced resources with the
    // version preferred by the server.
    //
    // The returned resource list might be non-nil with partial results even in the case of
    // non-nil error.
    ServerPreferredNamespacedResources() ([]*metav1.APIResourceList, error)
}

Here we can use ServerPreferredNamespacedResources to get a list of all resources that belong to namespace. Then filter out the resources that support DELETE. Finally, we get the GroupVersionResources (GVR for short) of these resources.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
    resources, err := d.discoverResourcesFn()
    if err != nil {
        // discovery errors are not fatal.  We often have some set of resources we can operate against even if we don't have a complete list
        errs = append(errs, err)
        conditionUpdater.ProcessDiscoverResourcesErr(err)
    }
    // TODO(sttts): get rid of opCache and pass the verbs (especially "deletecollection") down into the deleter
    deletableResources := discovery.FilteredBy(discovery.SupportsAllVerbs{Verbs: []string{"delete"}}, resources)
    groupVersionResources, err := discovery.GroupVersionResources(deletableResources)
    if err != nil {
        // discovery errors are not fatal.  We often have some set of resources we can operate against even if we don't have a complete list
        errs = append(errs, err)
        conditionUpdater.ProcessGroupVersionErr(err)
    }

Finally, traverse these GVRs for deletion.

1
2
3
   for gvr := range groupVersionResources {
        gvrDeletionMetadata, err := d.deleteAllContentForGroupVersionResource(gvr, namespace, namespaceDeletedAt)
    }

5. Why namespace stays in terminating state for a long time

To find out why namespace stays in terminating state for a long time, let’s look at the following very short piece of code.

1
2
3
4
5
6
7
8
    // there may still be content for us to remove
    estimate, err := d.deleteAllContent(namespace)
    if err != nil {
        return err
    }
    if estimate > 0 {
        return &ResourcesRemainingError{estimate}
    }

If an error is returned when deleting all resources in the namespace, or if the estimated time to finish deleting all resources is greater than 0, the pod will remain in the terminating state. For example, the pod will have a terminationGracePeriodSeconds, so you may have to wait for this period to pass when deleting the pod. But this does not cause any problems, we often encounter the headache that namespace has been unable to delete. Simply put, there must be resources under namespace that cannot be deleted, and there are several possibilities.

Some resources have admissions that prevent deletion, because all deletion requests have to go through the admission webhook first, so it is possible that some resources cannot be deleted directly because of the admissions.

apiservice is having problems. This problem can be confirmed by kubectl get apiservice. If there is false in the AVAILABLE column, we need to check why the apiservice is not available. If there is a problem with apiservice, the resources under this apiservice cannot be queried or operated by HTTP requests, so naturally, we cannot confirm whether there are still some resources left, and we cannot delete them completely.

Finally, regarding the solution for namespace can not be deleted, the solution given on the Internet is often to do it by emptying the spec.finalizers of the namespace, but this is the solution to the problem but not the root cause. Because if namespace cannot be deleted, it must mean that there is a defect or problem in your cluster, and the solution is to find out the real cause. You can also try to find out what the problem is with this tool: https://github.com/thyarles/knsk