Recently I was working on cortex optimization for prometheus ecosystem, and I came across a rather interesting go mod problem, which I’ll share here.

Why do I call the title: How to cheat Go mod? This is quite interesting, so I’ll sell it here, but it does break Go mod-related features.

Before we start this topic, we need to briefly introduce the cortex and thanos projects.

Limitations of Prometheus

When it comes to business development, you can’t do without a monitoring system. Prometheus is a cloud-native favorite, with excellent design and flexible usage, it graduated from CNCF with flying colors, and is the first choice for many companies to do monitoring.

However, Promethues also has its own limitations, the most influential of which is its data high availability solution and clustering solution. Monitoring is also one of the most important aspects of the business system, and the alarm cannot be issued in time because the monitoring system is down.

Prometheus official also proposed a federal solution to solve the clustering problem, but this solution is extremely complex and many problems still can not be solved, so the creation of two other CNCF sandbox projects: cortex and thanos. both projects are to solve the clustering of Promethues, high availability.

Since the two projects have the same goal of solving the problem, there are many features that can be reused with each other, and then something interesting happens.

cortex

That said, I had to change the thanos code due to some requirements. I replaced the thanos dependency of cortex when I was debugging locally.

1
replace github.com/thanos-io/thanos => /Users/hhf/goproject/cortex/thanos

And then when I compile it, it fails to compile.

1
2
3
# github.com/sercand/kuberesolver
../../../go/pkg/mod/github.com/sercand/kuberesolver@v2.1.0+incompatible/builder.go:108:82: undefined: resolver.BuildOption
../../../go/pkg/mod/github.com/sercand/kuberesolver@v2.1.0+incompatible/builder.go:163:32: undefined: resolver.ResolveNowOption

Don’t worry, let’s see who this kuberesolver depends on.

Let’s look at it before it gets replaced:

1
2
3
4
5
6
7
8
▶ go mod graph| grep kuberesolver
github.com/weaveworks/common@v0.0.0-20210419092856-009d1eebd624 github.com/sercand/kuberesolver@v2.1.0+incompatible
github.com/weaveworks/common@v0.0.0-20210112142934-23c8d7fa6120 github.com/sercand/kuberesolver@v2.1.0+incompatible
github.com/weaveworks/common@v0.0.0-20200206153930-760e36ae819a github.com/sercand/kuberesolver@v2.1.0+incompatible
github.com/weaveworks/common@v0.0.0-20201119133501-0619918236ec github.com/sercand/kuberesolver@v2.1.0+incompatible
github.com/weaveworks/common@v0.0.0-20200914083218-61ffdd448099 github.com/sercand/kuberesolver@v2.1.0+incompatible
github.com/weaveworks/common@v0.0.0-20200625145055-4b1847531bc9 github.com/sercand/kuberesolver@v2.1.0+incompatible
github.com/thanos-io/thanos@v0.13.1-0.20200731083140-69b87607decf github.com/sercand/kuberesolver@v2.4.0+incompatible

You can see that in the normal version, kuberesolver@2.4.0 is dependent on thanos and kuberesolver@v2.1.0 is dependent on weaveworks.

After replace:

1
2
▶ go mod graph| grep kuberesolver
github.com/weaveworks/common@v0.0.0-20210419092856-009d1eebd624 github.com/sercand/kuberesolver@v2.1.0+incompatible

Isn’t it amazing that the version of kuberesolver@v2.4.0 has disappeared? Since v2.1.0 and v2.4.0 of kuberesolver are incompatible, it won’t compile after replace.

Gomod replace semantics

It’s not magic, it’s about Go mod’s replace semantics, but it’s also an easy feature to ignore.

replace directives: (https://golang.org/ref/mod#go-mod-file-replace)

1
replace directives only apply in the main module's go.mod file and are ignored in other modules. See Minimal version selection for details.

In fact, it’s very simple: replace is only valid for the main module (i.e. your current project). This can be summarized as follows.

  • the replace of the main module does not work for the dependent module
  • The go.mod replace of the dependent module is also not valid for the main module

So, after replace, the thanos replace of the cortex dependency does not take effect. Let’s look at the dependency tree.

  • main module cortex => require github.com/weaveworks/common v0.0.0-20210419092856-009d1eebd624
  • weaveworks => requre github.com/sercand/kuberesolver v2.1.0+incompatible
  • So overall kuberesolver is now only v2.1.0

This logic is consistent with gomod’s replace semantics, i.e., replace is compiled correctly.

Spoofing gomod

It’s even more amazing how cortex compiles directly by requiring thanos, which is correct according to the gomod replace semantics.

This is because according to the documentation we know that replace only works on the main module, it does not work outside of it.

I did an experiment on https://github.com/georgehao/gomodtestmain, for those interested, to verify that gomod is following the gomod replace semantics and the MVS (Minimum Version Selection) algorithm.

The problem is basically at an impasse, so how do we break it?

Go ahead and use the go mod graph function to see the dependency tree of the cortex dependency thanos.

1
2
3
4
5
6
7
8
github.com/thanos-io/thanos@v0.19.1-0.20210729154440-aa148f8fdb28 gopkg.in/yaml.v3@v3.0.0-20210107192922-496545a6307
github.com/thanos-io/thanos@v0.13.1-0.20210401085038-d7dff0c84d17 github.com/Azure/azure-pipeline-go@v0.2.2
github.com/thanos-io/thanos@v0.8.1-0.20200109203923-552ffa4c1a0d k8s.io/utils@v0.0.0-20191114200735-6ca3b61696b6
github.com/thanos-io/thanos@v0.13.1-0.20210204123931-82545cdd16fe gopkg.in/yaml.v2@v2.3.0
github.com/thanos-io/thanos@v0.13.1-0.20201030101306-47f9a225cc52 go.uber.org/goleak@v1.1.10
github.com/thanos-io/thanos@v0.13.1-0.20200807203500-9b578afb4763 go.elastic.co/apm/module/apmot@v1.5.0
....
github.com/thanos-io/thanos@v0.13.1-0.20200731083140-69b87607decf github.com/gogo/protobuf@v1.3.1

Since this dependency tree is too long (700+ lines), I won’t post it, but basically you can see that cortex depends on more thanos N versions, and we found an interesting thing in go.mod in the last version.

1
2
3
require (
  github.com/sercand/kuberesolver v2.4.0+incompatible // indirect

That is, because of a very old version of thanos gomod require kuberesolver@v2.4.0, gomod mistakenly thought that the cortex-dependent thanos still required kuberesolver@v2.4.0. Although thanos has been changed to repace kuberesolver, cortex is compiled without any problem.

Is this a gomod bug?

Why does cortex depend on so many versions of thanos? This goes back to the opening question about the reuse of cortex and thanos functionality.

Currently, the two projects, cortex and thanos, basically depend on each other as follows:

1
2
3
4
cortex 1.9.0 -> thanos v0.19.1-0.20210729154440-aa148f8fdb28
thanos v0.19.1-0.20210729154440-aa148f8fdb28 -> cortex v1.8.1-0.20210422151339-cf1c444e0905
cortex v1.8.1-0.20210422151339-cf1c444e0905 -> thanos v0.13.1-0.20210401085038-d7dff0c84d17
....

The cross-reference between cortex and thanos, like Russian nesting dolls, is a nightmare for gomod. go mod replace semantics, surprisingly, let these two nesting dolls to crack.

How to solve

The problem of how to cortex replace thanos, in fact, know the root of the problem, the solution will be very simple, there are two ways it.

  1. due to the gomod MVS algorithm, we directly specify the kuberesolver version as v2.4.1 in the main project cortex.
  2. option 1 is only applicable for backward compatible projects, if a project is not responsible for this, this may be a problem, so the more direct solution is to modify thanos go.mod directly, moving the kuberesolver that thanos depends on from replace to require.