go module

1. The cause of the problem

With the introduction of the Go module in Go 1.11, the Go command to pull the dependent public go module is no longer a “pain point”. This is shown in the figure below.

go module

We only need to configure a public GOPROXY service for the environment variable GOPROXY within our company/organization to easily pull all public go modules (public mods are open source mods).

However, as the number of Go users in the company increases and the number of Go projects increases, the problem of “code duplication” arises. It became necessary to pull public code and put it into a separate, internal private repository that could be reused. So we have the need to pull private go modules! **

Some companies or organizations put all their code in a public vcs hosting provider (e.g. github.com), and the private go module is placed directly in the private repository of the corresponding public vcs service. If the same is true for your company, then pulling the private go module hosted in the public vcs private repository is also easy, see the following figure.

go module

Of course, a prerequisite for this solution is that each developer needs to have access to the private go module repository on the public vcs service, and the credentials can be in any form, either basic auth user and password, or personal access token (similar to github), as long as they are provided in accordance with the public vcs authentication requirements can be provided.

However, if the private go module is placed on the company’s internal vcs server, as shown in the following figure.

private go module

So how do we get Go commands to automatically pull private go modules from internal servers?

Some gopher will say: " That’s easy! It’s no different than pulling a private go module hosted on a public vcs service". Most of the gophers who hold this view are from large companies. Large companies have a well-developed IT infrastructure for development, and their internal vcs servers are accessible via domain names (e.g. git.bat.com/user/repo), so employees within large companies can access private go modules on their internal vcs servers just like they can access public vcs services, as shown in the following diagram.

go module

We see: In the above scenario, the company built an internal goproxy service (i.e. in-house goproxy in the above figure), the purpose of which is to provide a way to pull external go modules for those development machines and ci machines that do not have direct access to the external network, and secondly, because of the cache of in-house goproxy This is to provide a way to pull external go modules for development machines and ci machines that do not have direct access to the extranet. For the private go module, the development machine will configure it into the GOPRIVATE environment variable, so that the Go command will not take GOPROXY when pulling the private go module, but will use the direct access to vcs (such as git.bat.com in the above figure) to pull the private go module.

Of course, the big manufacturers may also use the following scheme to hand over both external go module and private go module to the internal unified Goproxy service to handle.

go module

In this scenario, developers only need to configure GOPROXY as in-house goproxy to pull external go module and private go module uniformly, but since the go command by default will perform sum verification (to sum.golang.org) on all go modules pulled by goproxy, and our private go module has no data record in the public sum validation server, so the developer needs to fill the private go module into the GONOSUMDB environment variable, so that the go command will not perform sum checksum on it. However, there is one thing to note about this solution: the in-house goproxy needs to have access to all the repo where the private module is located, so as to ensure the success of each private go module pull!

Well, here’s the problem! How to implement a private go module pulling solution for small companies that do not have a complete internal IT infrastructure and want to put the private go module on the company’s internal vcs server?

2. A solution for small companies

small companies may be small, but their goals cannot be low. Although small companies have weak or inflexible IT infrastructure, they should not put too much extra “burden” on developers. Therefore, comparing the above two possible solutions for large companies, we prefer the latter. This way, we can leave all the complexity to the in-house goproxy node and the developers can make it simple enough. But how do we implement this solution when small companies don’t have DNS and can’t use domain names…? In this section, we implement this solution.

0. Solution example environment topology

We first prepare a sample environment for the subsequent implementation of the solution, with the following topology.

go module

1. Choosing a goproxy implementation

After the release of Go module proxy protocol specification, many mature open source implementations of Goproxy have appeared in the Go community. From the original athens to two excellent open source implementations in China: goproxy.cn and goproxy.io. Among them, goproxy.io is given on the official site as method for on-premises deployment, and based on that, we’ll implement our solution based on goproxy.io (the rest of the goproxy implementations should all work as well).

We install goproxy by performing the following steps on the in-house goproxy node in the above diagram.

1
2
3
4
5
$mkdir ~/.bin/goproxy
$cd ~/.bin/goproxy
$git clone https://github.com/goproxyio/goproxy.git
$cd goproxy
$make

After compiling, you will see the executable file named goproxy under the current bin directory (~/.bin/goproxy/goproxy/bin).

To create the goproxy cache directory.

1
$mkdir /root/.bin/goproxy/goproxy/bin/cache

Start goproxy.

1
2
$./goproxy -listen=0.0.0.0:8081 -cacheDir=/root/.bin/goproxy/goproxy/bin/cache -proxy https://goproxy.io
goproxy.io: ProxyHost https://goproxy.io

After startup goproxy listens on port 8081 (even if not specified, the default port for goproxy is 8081) and the specified upstream goproxy service is goproxy.io.

Note: This startup parameter of goproxy is not the final version, here we just want to verify that goproxy works as expected.

Next, let’s verify that goproxy is working as we expect.

We configure the GOPROXY environment variable on the development machine to point to 10.10.20.20:8081.

1
2
// .bashrc
export GOPROXY=http://10.10.20.20:8081

Once the environment variables are in effect, execute the following command.

1
$go get github.com/pkg/errors

The result is as expected, the dev machine downloaded the github.com/pkg/errors package without any problems.

On the goproxy side, we see the following logs.

1
2
3
4
5
6
goproxy.io: ------ --- /github.com/pkg/@v/list [proxy]
goproxy.io: ------ --- /github.com/pkg/errors/@v/list [proxy]
goproxy.io: ------ --- /github.com/@v/list [proxy]
goproxy.io: 0.146s 404 /github.com/@v/list
goproxy.io: 0.156s 404 /github.com/pkg/@v/list
goproxy.io: 0.157s 200 /github.com/pkg/errors/@v/list

And we also see the downloaded and cached github.com/pkg/errors package in the cache directory of goproxy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$cd /root/.bin/goproxy/goproxy/bin/cache
$tree
.
└── pkg
    └── mod
        └── cache
            └── download
                └── github.com
                    └── pkg
                        └── errors
                            └── @v
                                └── list

8 directories, 1 file

2. Customize the package import path and map it to the internal vcs repository

The small factory may not assign a domain name to the vcs server, and we can’t put an ip address in the import path of the Go private package, so we need to customize a path for our private go module, for example: mycompany.com/go/module1. We uniformly put the private go module in the code repository under mycompany.com/go.

The next problem is that when goproxy goes to pull mycompany.com/go/module1, it should get the address of the module1 repository on the internal vcs corresponding to mycompany.com/go/module1, so that goproxy can download the code corresponding to module1 from the internal vcs code server.

go module

There is actually more than one solution. Here we use a tool called govanityurls, which I mentioned in my previous article.

By combining govanityurls and nginx, we can map the import path of a private go module to the real address of its code repository on vcs. The following diagram explains the exact principle.

govanityurls

First, for goproxy to not forward incoming requests for pulling private go modules (mycompany.com/go/module1) to the public proxy, it needs to do something with its startup parameters, such as the following modified goproxy startup command.

1
$./goproxy -listen=0.0.0.0:8081 -cacheDir=/root/.bin/goproxy/goproxy/bin/cache -proxy https://goproxy.io -exclude "mycompany.com/go"

So any go module pull request that matches the value after -exclude, goproxy will not forward it to goproxy.io, but directly request the “source” of the go module. What you need to do in the above diagram is to convert the address of this “source” to a repository address in the internal vcs service. Since the domain name mycompany.com does not exist, we can see from the diagram that we have added this entry to /etc/hosts on the node where goproxy is located.

1
127.0.0.1 mycompany.com

Thus, the requests sent by goproxy to mycompany.com are actually sent to the local machine. In the above diagram, it is nginx that is listening on port 80 of the local machine. nginx has the following configuration for the mycompany.com host.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// /etc/nginx/conf.d/gomodule.conf

server {
        listen 80;
        server_name mycompany.com;

        location /go {
                proxy_pass http://127.0.0.1:8080;
                proxy_redirect off;
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
        }
}

We see that for the request with path mycompany.com/go/xxx, nginx forwards the request to 127.0.0.1:8080, which is exactly the address of the service govanityurls tool listens to the address.

The tool govanityurls is a tool open sourced by Jaana B. Dogan, a former member of the Go core development team, which helps gopher quickly implement a go get import path for custom Go packages.

govanityurls itself is like a “navigation” server. When the go command makes a request to a custom package address, it actually sends the request to the govanityurls service, and then govanityurls returns the real address of the requested package repository (read from the vanity.yaml configuration file) to the go command, which then gets the package data from the real repository address.

Note: The installation method of govanityurls is simple, just go install/go get github.com/GoogleCloudPlatform/govanityurls directly.

In our example, the configuration of vanity.yaml is as follows.

1
2
3
4
5
6
host: mycompany.com

paths:
  /go/module1:
      repo: ssh://admin@10.10.30.30/module1
      vcs: git

That means when govanityurls receives a request forwarded by nginx, it will match the request with the module path configured in vanity.yaml, and if the match is ok, the real repo address of the module will be returned in the response format expected by the go command. Here we see that the repository address on the real vcs corresponding to module1 is: ssh://admin@10.10.30.30/module1.

So goproxy receives this address, makes another request to this real address, and eventually caches module1 to the local cache and returns it to the client.

Note: Since this scenario is the same as the second scenario of the big factory, goproxy needs to have access to all the real vcs repositories corresponding to the go modules under mycompany.com/go.

3. Development machine (client) setup

In the previous example, we have set the GOPROXY environment variable of the development machine to the service address of goproxy. But we said that all go modules pulled by GOPROXY will have their sum values verified by the go command to the public GOSUM server by default. But we are essentially pulling a private go module, and the GOSUM server does not have the sum data of our go module. This will cause the go build command to report an error and prevent the build process from continuing.

Therefore, the development machine client also needs to set mycompany.com/go as a value to the GONOSUMDB environment variable, which tells the go command that any go module that matches mycompany.com/go does not need to do sum checksum.

4. The “shortcomings” of the solution

Of course, the above solution is not perfect, it has its own shortcomings.

  • Developers still need to configure additional GONOSUMDB variables

Since the Go command by default checksum the go module pulled from GOPROXY, we need to configure the private go module to the GONOSUMDB environment variable, which brings a small “burden” to the developer.

Mitigation measures: small companies can put private go projects under a specific domain, so there is no need to add GONOSUMDB configuration for each go private project separately, you only need to configure it once.

  • Add a private go module, vanity.yaml needs to be updated manually

This is the most inflexible part of this solution. Due to the limited functionality of govanityurls, we may need to configure each private go module separately with its corresponding vcs repository address and access method (git, svn or hg).

Mitigation solution: manage multiple private go modules in a single vcs repository, as in etcd. Compared to the original go official recommendation of one repo managing only one module, newer versions of go have come a long way in terms of one repo managing multiple go modules has come a long way.

But for small companies, this extra work should be nothing compared to the benefits gained! ^_^

  • Unable to divide permissions

As mentioned in the above scenario, the node where goproxy is located needs to have access to all private go modules in the vcs repo, but it is not possible to make differential authorization to the go developer side, so that as long as the private go module can be pulled by goproxy, the go developer can pull it.

However, for most small companies, all internal source code is in principle public within the enterprise, and this problem does not seem to be too big. If you think this is a problem, then you can only use the first option of the big manufacturers above.

3. Summary

Large and small companies alike, as the use of Go grows deeper, more people are accepted, and more and more complex projects are developed, issues like pulling private go modules will certainly come to the table.

For gopher in large companies, this may not be a problem, or even transparent to them. But for small companies and other organizations with incomplete internal IT infrastructure, it does require a do-it-yourself solution.

This article provides an idea and a reference implementation for small companies to build Go private libraries and pull private go modules from private libraries.

If you think the above installation and configuration steps are a bit tedious, those who are interested in going deeper can package the above programs (goproxy, nginx, govanityurls) into a container image to achieve one-click installation and setup.