Centralized data management of multiple Prometheus instances with Thanos

1. layering of monitoring

layering of monitoring

As shown above, two strategies are used when building a monitoring system:

The advantage of separating IaaS, MySQL middleware, and App tier monitoring is that the systems are highly available and fault-tolerant to each other. When the App tier monitoring does not work, the IaaS tier monitoring will immediately show up.
Separation of long and short term metrics. Short-term metrics are used to provide alerting systems with high frequency queries for recent data, and long-term metrics are used to provide people with queries for data sets that span a larger period of time.

This is collectively referred to here as a layering strategy for monitoring, except that one is layered in the infrastructure dimension and one is layered in the time dimension.

2. Current Situation and Selection

The current situation is that there is no long- and short-term stratification of monitoring, and a common set of Prometheus. When querying long-period metrics, the memory and CPU usage of the server where Prometheus is located rises sharply, even causing monitoring and alerting services to be unavailable.

The reason is twofold:

Prometheus loads a lot of data into memory when querying long-period data.
Prometheus is not loading downsampled data

The larger the range of queries, the more memory is needed. In another production scenario, we used VictoriaMetrics standalone as remote storage, deploying up to 128 GB of memory. Also, there was data loss in this approach.

The Prometheus Federation approach, on the other hand, only addresses the aggregation of multiple Prometheus, and does not provide the ability to sample and accelerate long-term metrics queries, which is not applicable to the current remote storage scenario.

Finally, seeing that the Thanos Compact component can compress and downsample metrics data, I decided to try using Thanos as the current remote storage for multiple Prometheus.

3. Several ways to deploy Thanos

3.1 Basic components

Query, which implements the Prometheus API and provides a Prometheus-compliant query interface to the outside world
Sidecar, which is used to connect to Prometheus, providing Query query interface, and can also report data
Store Gateway, which accesses metric data placed in object stores
Compactor, which compresses samples and cleans up data in the object store
Receiver, receives data from Prometheus Remote Write
Ruler, configures and manages alerting rules

3.2 Receive mode

Receive mode

In Receive mode, you need to configure a remote write in each Prometheus instance to upload data to Thanos, where the real-time data is stored in the Thanos Receiver, so the Sidecar component is not needed to complete the query.

Benefits.

Data centralization
Prometheus is stateless
Only the Receiver needs to be exposed for Prometheus to access

Disadvantages:

Receiver is subject to a lot of remote write writes from Prometheus

3.3 Sidecar mode

Sidecar mode

In Sidecar mode, a Thanos Sidecar component is added next to each Prometheus instance to enable management of Prometheus. There are two main functions.

Accepting query requests from the Query component. When Thanos queries short-term data, the request will go to Sidecar.
Uploading short-term metrics data for Prometheus. By default, every two hours, a block is created and uploaded to the object store.

Advantages:

Easy integration, no need to modify the original configuration

Disadvantages:

Recent data requires network requests between Query and Sidecar, which adds additional time.
Requires Store Gateway to have access to each Prometheus instance

4. Deploying Thanos

4.1 Deploying a Minio

Please consult the official documentation.

Once the installation is complete, test the configuration according to the documentation to ensure that the Minio service works properly.

4.2 Create a Bucket named thanos on Minio

as follows.

Create a Bucket named thanos on Minio

4.3 Checking Prometheus version compliance with Thanos requirements

Currently Thanos requires that Prometheus version should preferably be no lower than v2.13.

4.4 Deploying Thanos

Ensure that default storage is available on the Kubernetes cluster

kubectl get sc

NAME                         PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
openebs-device               openebs.io/local   Delete          WaitForFirstConsumer   false                  4d5h
openebs-hostpath (default)   openebs.io/local   Delete          WaitForFirstConsumer   false                  4d5h

Create a new namespace thanos

`1`	`kubectl create ns thanos`

Deploying Thanos

`1`	`git clone https://github.com/shaowenchen/demo`

Modify the Minio access address in the demo/objectstorage.yaml* View related loads file. Then create the Thanos-related load.

`1`	`kubectl apply -f ./demo/thanos-0.25/`

View related loads

kubectl -n thanos top pod

NAME                           CPU(cores)   MEMORY(bytes)
thanos-compact-0               1m           30Mi
thanos-query-7c745f5d7-svlgn   2m           76Mi
thanos-receive-0               1m           15Mi
thanos-rule-0                  1m           18Mi
thanos-store-0                 1m           55Mi

Deploying Thanos consumes few resources.

4.5 Accessing Thanos Query

View the ports of Thanos related services

kubectl -n thanos get svc

NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                           AGE
thanos-compact   ClusterIP   10.233.47.253   <none>        10902/TCP                         11h
thanos-query     NodePort    10.233.45.138   <none>        10901:32180/TCP,9090:32612/TCP    11h
thanos-receive   ClusterIP   None            <none>        10902/TCP,19291/TCP,10901/TCP     11h
thanos-rule      ClusterIP   None            <none>        10901/TCP,10902/TCP               11h
thanos-store     NodePort    10.233.41.159   <none>        10901:30901/TCP,10902:31426/TCP   10h

Accessing the Thanos Query page

thanos-query provides an http access portal on port 9090, so here the pages provided by the Query component are accessed through the host IP:32612 port.

Accessing the Thanos Query page

5. Add Thanos Sidecar to Prometheus

Sidecar mode requires less Thanos configuration, while Receiver mode needs to keep receiving Remote Write from many Prometheus, so Sidecar mode is chosen here for cost reasons.

5.1 Add S3 access credentials to the Prometheus namespace

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: thanos-objectstorage
  namespace: minitor
type: Opaque
stringData:
  objectstorage.yaml: |
    type: S3
    config:
        bucket: "thanos"
        endpoint: "0.0.0.0:9000"
        insecure: true
        access_key: "minioadmin"
        secret_key: "minioadmin"
EOF

The administrator account is used directly here. For production, a separate account should be created for Thanos’ use of Minio.

5.2 Adding additional Label tag instances to Prometheus

By adding external_labels to Prometheus you can add an additional label to each Prometheus instance globally to uniquely mark an instance.

Edit the configuration file

`1`	`kubectl -n monitor edit cm prometheus-server`

Add the following

  prometheus.yml: |
    global:
      external_labels:
        cluster: dev

A tag named cluster=dev has been added here. All metrics reported by this Prometheus instance will carry this tag for easy querying and filtering.

5.3 Modifying Prometheus startup parameters to turn off compression

Edit the Prometheus deployment file

Some deployments are done with Deployment and some are done with StatefulSet, both need to modify the Prometheus startup parameters

`1`	`kubectl -n monitor edit deploy prometheus-server`

Modify tsdb to store blocks with equal maximum and minimum values

1
2
3

        - --storage.tsdb.max-block-duration=2h
        - --storage.tsdb.min-block-duration=2h
        image: quay.io/prometheus/prometheus:v2.31.1

storage.tsdb.min-block-duration and storage.tsdb.max-block-duration are equal to guarantee that Prometheus has local compression turned off to avoid Thanos upload failures when compressing.

5.4 Adding Thanos Sidecar to Prometheus

Edit the Prometheus deployment file

`1`	`kubectl -n monitor edit deploy prometheus-server`

Add the following containers

      - args:
        - sidecar
        - --log.level=debug
        - --tsdb.path=/data
        - --prometheus.url=http://127.0.0.1:9090
        - --objstore.config-file=/etc/objectstorage.yaml
        name: thanos-sidecar
        image: thanosio/thanos:v0.25.0
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        ports:
        - name: http-sidecar
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
        readinessProbe:
          httpGet:
            port: 10902
            path: /-/ready
        volumeMounts:
        - mountPath: /data
          name: storage-volume
        - name: thanos-objectstorage
          subPath: objectstorage.yaml
          mountPath: /etc/objectstorage.yaml

New mount secret key

1
2
3

      - name: thanos-objectstorage
        secret:
          secretName: thanos-objectstorage

Reboot Prometheus

Rolling upgrades will encounter the following error, caused by the previous Prometheus Pod not releasing the file directory.

`1`	`ts=2022-03-21T04:06:39.267Z caller=main.go:932 level=error err="opening storage failed: lock DB directory: resource temporarily unavailable"`

So you need to set the number of copies to 0 first, then set it to 1 and restart Prometheus.

1
2

kubectl -n monitor scale deploy prometheus-server --replicas=0
kubectl -n monitor scale deploy prometheus-server --replicas=1

5.5 Adding a Grpc Remote Access Port to Prometheus Sidecar

Edit the Prometheus Service configuration

`1`	`kubectl -n monitor edit svc prometheus-server`

Add a Service port to expose the Grpc service to Thanos Store Gateway

  ports:
  - name: sidecar-grpc
    nodePort: 30901
    port: 10901
    protocol: TCP
    targetPort: 10901
  type: NodePort

5.6 Adding Store Grpc Addresses to Thanos Query

Finally, you need to add the Grpc address of the Prometheus Sidecar above to the Thanos Store Gateway.

Edit Thanos Query

`1`	`kubectl -n thanos edit deploy thanos-query`

Add --store=0.0.0.0:30901 to the startup parameters

      - args:
        - query
        - --log.level=debug
        - --query.auto-downsampling
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --query.partial-response
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --store=0.0.0.0:30901
        image: thanosio/thanos:v0.25.0

Here 0.0.0.0:30901 needs to be replaced with the Grpc access portal exposed by Prometheus Sidecar above. This way, when Thanos Query provides query capabilities, short-term data will call Grpc queries instead of querying data in the object store.

At this point, the Thanos Query page mentioned above and the new 0.0.0.0:30901 Endpoint record can be seen, and the status should be Up.

5.7 Viewing synchronized data in Minio

Viewing synchronized data in Minio

A total of 6 clusters were added, each with about 40 Pods, using about 2.1 GB of storage and 303 objects in half a day.

6. Grafana Configuration

6.1 Adding a Data Source

Adding a Thanos Query data source to Grafana is done in the same way as adding Prometheus. As shown in the following figure.

Adding a Thanos Query data source to Grafana

6.2 Modifying the Grafana panel to accommodate cluster tag filtering

Here is a slight modification to the Kubernetes cluster-based view panel.

Add variables for cluster filtering

Above, I added a global external_labels to each Prometheus to distinguish between clusters by the cluster field.

added a global external_labels to each Prometheus

As shown above, add a Cluster variable to the panel and filter it using the cluster tag in the metrics.

Edit the filtering query criteria for each view

Edit the filtering query criteria for each view

As shown above, you need to add an additional filter condition, cluster=~"^$Cluster$"}, to the expression of each view. Of course, you can also export the panel, make bulk changes in the editor, and then import it into Grafana.

6.3 Viewing Thanos and Prometheus Data Sources

Using the Thanos data source

Using the Thanos data source

Using the Prometheus data source

Using the Prometheus data source

Comparing the data from both panels, we can see that they show the same metrics. Therefore, we can use one Thanos data source instead of a scenario where multiple Prometheus data sources are managed decentralized.

Here the time scale of the data does not reach the parameter settings of the Thanos Compact component, so the effect of downsampling is not reflected.

7. Summary

This article is mainly about some ideas for monitoring data layer management.

First of all, the data should be tiered, with short-term data stored directly in the nearest Prometheus and long-term data stored in Thanos’ object storage. Short-term data is provided to the alarm system for high-frequency queries, and long-term data is provided to people for analysis.

The main reason for choosing Thanos is its downsampling; the Thanos compact component provides 5-minute and 1-hour downsampling, and with Prometheus sampling every 15s, the compression will be 20x and 240x, which can greatly relieve the pressure of long-period queries. In Sidecar mode, short-term data is queried through Grpc using Prometheus’ API.

Finally, of course, all 6 local clusters are connected to Thanos, and only after trying it out, we can really appreciate some of the details and processing logic. I’ve read a lot of architecture diagrams, documents and blogs, but it’s not as good as trying it out myself.

Table of Contents