There is a Kubernetes webhook maintained within the group that intercepts pod creation requests and makes some changes (e.g. adding environment variables, adding init-container, etc.).

The business logic itself is simple, but it’s hard to handle if errors are generated in the process. Either you directly prevent the pod from being created, then there is a risk that the application will not start. Either ignore the business logic, then it will lead to silent failure and no one will know that an error has occurred here.

So the plain idea is to tap into the alerting system, but this leads to coupling the current component with the specific alerting system.

In Kubernetes, there are Event mechanisms that can do the job of logging some events, such as warnings, errors, and other information, which are more suitable for this scenario.

What are Events/Event in Kubernetes?

An Event is one of the many resource objects in Kubernetes that are typically used to record state changes that occur within a cluster, ranging from cluster node exceptions to Pod starts, successful scheduling, and so on.

For example, if we Describe a pod, we can see the events corresponding to that pod:

kubectl describe pod sc-b-68867c5dcb-sf9hn

Events/Event in Kubernetes

As you can see, everything from scheduling, to starting, to the eventual failure of this pod to pull the image is recorded by way of an Event.

Let’s look at the structure of an Event.

$ k get events -o json | jq .items[10]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "apiVersion": "v1",
  "count": 1,
  "eventTime": null,
  "firstTimestamp": "2021-12-04T17:02:14Z",
  "involvedObject": {
    "apiVersion": "v1",
    "fieldPath": "spec.containers{sc-b}",
    "kind": "Pod",
    "name": "sc-b-68867c5dcb-sf9hn",
    "namespace": "default",
    "resourceVersion": "322554830",
    "uid": "24df4a07-f41e-42c2-ba26-d90940303b00"
  },
  "kind": "Event",
  "lastTimestamp": "2021-12-04T17:02:14Z",
  "message": "Error: ErrImagePull",
  "metadata": {
    "creationTimestamp": "2021-12-04T17:02:14Z",
    "name": "sc-b-68867c5dcb-sf9hn.16bd9bf933d60437",
    "namespace": "default",
    "resourceVersion": "1197082",
    "selfLink": "/api/v1/namespaces/default/events/sc-b-68867c5dcb-sf9hn.16bd9bf933d60437",
    "uid": "f928ff2d-c618-44a6-bf5a-5b0d3d20e95e"
  },
  "reason": "Failed",
  "reportingComponent": "",
  "reportingInstance": "",
  "source": {
    "component": "kubelet",
    "host": "eci"
  },
  "type": "Warning"
}

As you can see, an event, the more important fields: * type - the type of the event, can be Warning, Normal, Error, etc.

  • type - the type of the event, can be Warning, Normal, Error, etc.
  • reason - the reason for the event, can be Failed, Scheduled, Started, Completed, etc.
  • message - the description of the event
  • involvedObject - the resource object for the event, could be Pod, Node, etc.
  • source - the source of the event, can be kubelet, kube-apiserver, etc.
  • firstTimestamp,lastTimestamp - the first and last time of the event

Based on this information, we can do some cluster-level monitoring and alerting, such as Aliyun’s ACK, which will send Event to SLS, and then do alerting according to the corresponding rules.

How to report events

The previous section talks about what an Event is in Kubernetes, but we have to report the event to let the Kubernetes cluster know that it happened and thus make subsequent monitoring and alerting.

How to access the Kubernetes API

The first step in reporting events is to access the Kubernetes API, which is based on the Restful API, and Kubernetes is also based on this API, wrapped with an SDK that is directly available.

To connect to the Kubernetes API through the SDK, there are two ways.

The first one is accessed through the kubeconfg file (from outside), and the second one is accessed through serviceaccount (from Pod).

For the sake of simplicity, we use the first way as an example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
package main

import (
	"flag"
	"fmt"
	"path/filepath"

	"k8s.io/client-go/kubernetes"
	"k8s.io/client-go/tools/clientcmd"
	"k8s.io/client-go/util/homedir"
)

func main() {
	var kubeconfig *string
	if home := homedir.HomeDir(); home != "" {
		kubeconfig = flag.String("kubeconfig", filepath.Join(home, ".kube", "config"), "(optional) absolute path to the kubeconfig file")
	} else {
		kubeconfig = flag.String("kubeconfig", "", "absolute path to the kubeconfig file")
	}
	flag.Parse()

	config, err := clientcmd.BuildConfigFromFlags("", *kubeconfig)
	if err != nil {
		panic(err)
	}
	clientset, err := kubernetes.NewForConfig(config)
	if err != nil {
		panic(err)
	}
	versionInfo, err := clientset.ServerVersion()
	if err != nil {
		panic(err)
	}
	fmt.Printf("Version: %#v\n", versionInfo)
}

By running this code, you can connect to the cluster and get the Kubernetes Server version.

1
Version: &version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-aliyun.1", GitCommit:"27f24d2", GitTreeState:"", BuildDate:"2021-08-19T10:00:16Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

How to create, escalate events

In the above example, with the clientset object, we now have to rely on this object to create an event in the Kuberentes cluster.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
now := time.Now()
message := "test message at " + now.Format(time.RFC3339)
// 命名空间为default
_, err = clientset.CoreV1().Events("default").Create(&apiv1.Event{
    ObjectMeta: metav1.ObjectMeta{
        GenerateName: "test-",
    },
    Type:           "Warning",
    Message:        message,
    Reason:         "OnePilotFail",
    FirstTimestamp: metav1.NewTime(now),
    LastTimestamp:  metav1.NewTime(now),
    InvolvedObject: apiv1.ObjectReference{
        Namespace: "default",
        Kind:      "Deployment",
        Name:      "sc-b",
    },
})
fmt.Printf("create event with err: %v\n", err)

In the above example, we have created an Event under the namespace default starting with test- and the type of this Event is Warning.

We can also look at the final Event that is generated.

kubectl get events -o json | jq .items[353]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
  "apiVersion": "v1",
  "eventTime": null,
  "firstTimestamp": "2021-12-04T17:27:06Z",
  "involvedObject": {
    "kind": "Deployment",
    "name": "sc-b",
    "namespace": "default"
  },
  "kind": "Event",
  "lastTimestamp": "2021-12-04T17:27:06Z",
  "message": "test message at 2021-12-05T01:27:06+08:00",
  "metadata": {
    "creationTimestamp": "2021-12-04T17:27:06Z",
    "generateName": "test-",
    "name": "test-vvjzp",
    "namespace": "default",
    "resourceVersion": "1198057",
    "selfLink": "/api/v1/namespaces/default/events/test-vvjzp",
    "uid": "f2bcdd1c-442f-4f61-921a-e18637ee5871"
  },
  "reason": "OnePilotFail",
  "reportingComponent": "",
  "reportingInstance": "",
  "source": {},
  "type": "Warning"
}

In this way, people who care about the corresponding events, such as those of the operations and maintenance staff, can do monitoring and alerting based on this information.

Usage Scenarios

Unlike business events, Kubernetes events are resources in the cluster, and the people who care about them are mostly the cluster maintainers.

Therefore, this event reporting mechanism is more suitable for some basic components to use, so that cluster maintainers can understand the current state of the cluster.

If you need to have more flexible alerting and monitoring, then you can use a time and alerting system that is closer to the business and has richer rules.