For a production environment and an aspiring ops person, even millisecond level downtime is not tolerated. Proper logging and monitoring of infrastructure and applications is very helpful in solving problems, and can also help optimize costs and resources, as well as help detect some problems that may occur later. In this article we will use the more lightweight Grafana Loki to implement log monitoring and alerting. The key thing is that if you are familiar with using Prometheus, you will have no problem using Loki, as they are basically the same, and have the same Label tag if they are auto-discovered in a Kubernetes cluster.

## Components

Before using Grafana Loki, let’s briefly introduce the three main components it contains.

Promtail

Promtail is a log collection tool used to send container logs to Loki or Grafana services. The tool mainly consists of discovering collection targets and adding Label tags to log streams and then sending them to Loki.

Loki

Loki is a Prometheus-inspired log aggregation system that is horizontally scalable, highly available, and multi-tenant capable, using the same service discovery mechanism as Prometheus, adding labels to log streams instead of building full-text indexes. Because of this, the logs received from Promtail have the same set of tags as the application’s metrics. So, not only does it provide better context switching between logs and metrics, it also avoids full-text indexing of logs.

Grafana

Grafana is an open source platform for monitoring and visualizing observations, supporting a very rich set of data sources, and in the Loki technology stack it is dedicated to presenting time series data from data sources such as Prometheus and Loki. In addition, it allows us to query, visualize, and alert, and can be used to create, explore, and share data Dashboards that encourage a data-driven culture.

## Deployment

To facilitate the deployment of the Loki technology stack, we use the more convenient Helm Chart package for installation here, modifying the values to suit your needs.

First we install the Prometheus-Operator, which contains Promtail, Prometheus, AlertManager, and Grafana, and then we install the Loki components separately.

First create a file called loki-stack-values.yaml to override the Values values of Loki deployment, the contents of the file are as follows.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  # Loki Stack Values promtail: serviceMonitor: enabled: true additionalLabels: app: prometheus-operator release: prometheus pipelineStages: - docker: {} - match: selector: '{app="nginx"}' stages: - regex: expression: '.*(?PGET /.*)' - metrics: nginx_hits: type: Counter description: "Total nginx requests" source: hits config: action: inc 

Here we have enabled ServiceMonitor for Promtail and added two tags. Loki then transforms the log line, changing its label and modifying the format of the timestamp. Here we add a match phase that will match log stream data with app=nginx, and then the next phase is to filter out log lines that contain GET keywords using regular expressions.

In the metrics metrics phase, we define a nginx_hits metric, and Promtail exposes this custom metrics data through its /metrics endpoint. Here we define a Counter type metric that is incremented when filtered from the regex phase. In order to view this metric in Prometheus, we need to grab this metric from Promtail.

Install Loki using the command shown below.

 1 2 3  $helm repo add loki https://grafana.github.io/loki/charts$ helm repo update $helm upgrade --install loki loki/loki-stack --values=loki-stack-values.yaml -n kube-mon  Then install the Prometheus Operator, again creating a file called prom-oper-values.yaml to override the default Values values, as follows.   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  grafana: additionalDataSources: - name: loki access: proxy orgId: 1 type: loki url: http://loki:3100 version: 1 additionalPrometheusRules: - name: loki-alert groups: - name: test_nginx_logs rules: - alert: nginx_hits expr: sum(increase(promtail_custom_nginx_hits[1m])) > 2 for: 2m annotations: message: 'nginx_hits total insufficient count ({{$value }}).' alertmanager: config: global: resolve_timeout: 1m route: group_by: ['alertname'] group_wait: 3s group_interval: 5s repeat_interval: 1m receiver: webhook-alert routes: - match: alertname: nginx_hits receiver: webhook-alert receivers: - name: 'webhook-alert' webhook_configs: - url: 'http://dingtalk-hook:5000' send_resolved: true 

Here we configure Loki as a data source for Grafana, and then configure alarm rules called nginx_hits that are executed sequentially at certain intervals within the same group. The threshold for triggering alarms is configured via the expr expression. Here, we indicate whether the sum of the additions within 1 minute is greater than 2. When the condition in the expor expression lasts for 2 minutes, the alarm will actually be triggered, and the alarm will remain in the Pending state until it is actually triggered.

The installation command is shown below.

 1  $helm upgrade --install prometheus stable/prometheus-operator --values=prom-oper-values.yaml -n kube-mon  Next, we can install a test Nginx application with the following list of resources: (nginx-deploy.yaml)   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35  apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: selector: matchLabels: app: nginx replicas: 1 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx jobLabel: nginx spec: ports: - name: nginx port: 80 protocol: TCP selector: app: nginx type: NodePort  For testing purposes, we use a NodePort-type service here to expose the application, which can be installed directly at  1  $ kubectl apply -f nginx-deploy.yaml 

The Pod list of all applications after installation is completed is shown below.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  $kubectl get pods -n kube-mon NAME READY STATUS RESTARTS AGE alertmanager-prometheus-prometheus-oper-alertmanager-0 2/2 Running 0 6m16s loki-0 1/1 Running 0 39m loki-promtail-62thc 1/1 Running 0 17m loki-promtail-99bpf 1/1 Running 0 17m loki-promtail-ljw5m 1/1 Running 0 17m loki-promtail-mr85p 1/1 Running 0 17m loki-promtail-pw896 1/1 Running 0 17m loki-promtail-vq8rl 1/1 Running 0 17m prometheus-grafana-76668d6c47-xf8d7 2/2 Running 0 13m prometheus-kube-state-metrics-7c64748dd4-5fhns 1/1 Running 0 13m prometheus-prometheus-oper-admission-patch-pkkp9 0/1 Completed 0 8m7s prometheus-prometheus-oper-operator-765447bc5-vcdzs 2/2 Running 0 13m prometheus-prometheus-prometheus-oper-prometheus-0 3/3 Running 1 6m17s$ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-6f8b869ccf-58rzx 1/1 Running 0 16s 

For testing purposes, we can change the Service corresponding to the Grafana installation of the Prometheus Operator to the NodePort type, and then use the default admin username and password prom-operator to log in to Grafana.

## Testing

Let’s simulate triggering an alarm by using the shell command shown below to simulate accessing the Nginx application every 10s.

 1 2 3  \$ while true; do curl --silent --output /dev/null --write-out '%{http_code}' http://k8s.qikqiak.com:31940; sleep 10; echo; done 200 200 

At this point we go to the Grafana page and filter the logs of the Nginx application to see.

Also at this time, the nginx-hints alert rule we configured is triggered:

If the alarm threshold is consistently reached within two minutes, the alarm will be triggered.

Normally, we can also receive the corresponding alerts in our WebHook at this time.

This is the end of our log collection, monitoring and alerting operation for our application using PLG technology stack.