When you consider adding to your application based on the capabilities of Kubernetes, it’s like opening a Pandora’s Box, and you don’t know exactly what’s in that box, just as you don’t know what is, or will be, happening to the Kubernetes cluster and the applications on it that you rely on.

No matter what architecture is chosen and what runtime the underlying layer is based on, observability always has a very high priority. There is an argument that if you don’t know how to operate and maintain it, don’t try to deploy it. That’s a very down-to-earth way of thinking about the end in mind.

That said, if we embrace Kubernetes, what does the “observability” we’re looking for look like? For microservices architectures, I see a few areas as passing lines.

  1. observability of cluster and application state
  2. cluster and application logging
  3. observability of inter-application traffic, call relationships and request status

In short, it is: monitoring, logging, tracking, and Prometheus is a more mature solution in Kubernetes monitoring.

Prometheus

Prometheus is an open source monitoring and alerting system based on a temporal database, born from SoundCloud. prometheus works on the basic principle of periodically capturing the state of monitored components via the HTTP protocol, so monitored components only need to implement a compliant HTTP interface to access it. For components that do not provide an HTTP interface by default, such as Linux, MySQL, etc., Prometheus supports the use of an exporter to collect information and provide a metrics interface on behalf of the component.

Prometheus

A post on SoundCloud’s blog briefly explains the Prometheus architecture and how it works. The article identifies four characteristics that Prometheus meets:

  1. A multi-dimensional data model
  2. Operational simplicity (easy to deploy and maintain)
  3. Scalable data collection and decentralized architecture (flexible data collection)
  4. a powerful query language (powerful query language)

The first and the fourth are also features of the temporal database, but Prometheus does not have any additional storage built in by default for easy deployment and chooses to implement it itself. For the fourth feature Prometheus implements the PromQL query language, which enables powerful query rules.

With iterations of the release, Prometheus features have long been more than just that.

Prometheus features

As you can see from the architecture diagram of Prometheus, there are 4 main components.

  1. Prometheus Server
  2. PushGateway
  3. AlertManager
  4. WebUI

Only Prometheus Server is the most important component, which is responsible for data collection. Prometheus uses the pull method to collect data from the monitored object, but if the monitored object needs to push its state to Prometheus through Push, PushGateway can be introduced, and the monitored object will actively The monitored object will actively push its state to PushGateway, and Prometheus Server will regularly go to PushGateway to collect it.

AlertManager and WebUI are not required components, the former can set alarms based on the collected data, and the latter can display monitoring data in real time by means of a web interface.

Prometheus Operator

Prometheus can be deployed in a variety of ways, and thanks to its simple working principle, you only need to deploy Prometheus Server to an environment where you can access the monitored objects.

However, for K8s, because of the relatively closed network environment within the cluster and the volatile IP of Pods, CoreOS has opened source to manage and deploy Prometheus via Operator (CRD) (https://github.com/coreos/prometheus-operator).

Installing the Operator

Installing Prometheus Operator is as simple as going directly to the root of your Git repository and kubectl apply where the bundle.yaml is located.

1
2
git clone https://github.com/coreos/prometheus-operator.git
kubectl apply -f prometheus-operator/bundle.yaml

Basic concepts

Prometheus Operator will host the deployment, management and administration of Prometheus, and based on the CRD in K8s, Prometheus Operator introduces several new CRs (Custom Resources):

  1. Prometheus: Description of the Prometheus Server cluster to be deployed
  2. ServiceMonitor/PodMonitor: describes the list of Prometheus Server’s targets
  3. Alertmanager: describes the Alertmanager cluster
  4. PrometheusRule: describes the alarm rules of Prometheus

The design concept of Prometheus Operator can be found in the document: https://github.com/coreos/prometheus-operator/blob/master/Documentation/design.md.

How it works

The Prometheus Operator listens for changes to the custom resource (CR) above and performs subsequent management logic, as shown in the diagram below.

Prometheus Operator

By creating a resource of type Prometheus (where Prometheus refers to a custom resource defined by the Prometheus Operator), the Prometheus selects the associated ServiceMonitor by label selector, and the ServiceMonitor selects the Service to be monitored by defining the label selector of the Service and gets the list of Pod ip’s to be monitored by the Endpoints corresponding to the Service.

Monitoring Application Demo

We briefly describe how to use prometheus-operator to monitor the application according to the official User Guides, more details can be found at: https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md.

Deployment monitored objects

Deploy an application with 3 copies via Deployment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: example-app
        image: fabxc/instrumented_app
        ports:
        - name: web
          containerPort: 8080

The Service is then created so that the service can provide a stable access portal.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080

Note that the label app=example-app is defined in Service, which is the basis for the selection of ServiceMonitor.

Deployment monitoring

Based on the Label defined in the Service, we can define ServiceMonitor.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

ServiceMonitor defines the label for team=frontend, which is the basis for Prometheus to choose ServiceMonitor. So you can create Prometheus.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

At this point it becomes clear that there are already prometheus instances being started.

1
2
3
4
5
6
7
# kubectl get po
NAME                                   READY   STATUS    RESTARTS   AGE
example-app-66db748757-bfqx4           1/1     Running   0          103m
example-app-66db748757-jqsh5           1/1     Running   0          103m
example-app-66db748757-jtbpc           1/1     Running   0          103m
prometheus-operator-7447bf4dcb-lzbf4   1/1     Running   0          18h
prometheus-prometheus-0                3/3     Running   0          100m

prometheus itself provides the WebUI, so we can create SVCs that are exposed to access outside the cluster (it’s best not to do this in a public network environment).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: NodePort
  ports:
  - name: web
    nodePort: 30900
    port: 9090
    protocol: TCP
    targetPort: web
  selector:
    prometheus: prometheus

At this point, you can see the monitoring information for the Demo application within the cluster.

sobyte

Cluster Monitoring

As you should understand from this custom demo, Prometheus is launching HTTP access to data via SVC, and cluster monitoring is simply giving Prometheus the ability to get the monitoring interface to Kubernetes components. Prometheus also supports the deployment of Node exporter in the form of DaemonSet to collect cluster node information directly.

The form of monitoring data collection for Kubernetes components depends on how the cluster is deployed. For binary deployments, you can install Prometheus directly on the Node and collect data; for container deployments, you can create an SVC for the Kubernetes component, and the subsequent operation is consistent with the monitoring of the cluster application. Related documentation can be found at https://coreos.com/operators/prometheus/docs/latest/user-guides/cluster-monitoring.html.