In reviewing the code that my colleagues have written to initiate external HTTP requests, I have rarely seen a more standard (or correct and safe) way to construct the URL of an HTTP request. What is standard practice, if you ask me? I probably can’t tell you exactly. However, I have a few simple criteria of my own.

  • Protocol: Does the request work without http://?
  • Path: Does it stitch out Path correctly with / at the end?
  • Query: Do the query parameters handle the transcoding correctly?

Preliminary Knowledge

What is a URL? The following structure is from the Go language URL official documentation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// A URL represents a parsed URL (technically, a URI reference).
//
// The general form represented is:
//
//  [scheme:][//[userinfo@]host][/]path[?query][#fragment]
//
// URLs that do not start with a slash after the scheme are interpreted as:
//
//  scheme:opaque[?query][#fragment]
//

Handling protocols correctly

Go’s url package does not support URLs without a protocol (Scheme), and since http internally also uses the url package to parse, the following request is wrong.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
package main

import (
    "log"
    "net/http"
)

func main() {
    u := `example.com`
    rsp, err := http.Get(u)
    if err != nil {
        log.Fatalln(err)
    }
    defer rsp.Body.Close()
}

The error is reported as follows.

1
2
3
$ go run main.go
2022/04/05 18:28:20 Get "example.com": unsupported protocol scheme ""
exit status 1

I don’t know if this counts as a bug in the url package, so I won’t theorize here, but I always make a habit of dealing with it first in the following way.

1
2
3
4
u := `example.com`
if !strings.Contains(u, `://`) {
    u = `http://` + u
}

Note that I’m judging here by :// and not http:// or https://. There are several reasons for this.

  • Simplicity. No need to determine both http:// or https://.

  • Don’t worry about case. The protocol (scheme) part of the URL is case-insensitive. HTTP://example.com and http://example.com are equivalent. If you insist on determining the prefix, you should also write it as follows.

    1
    2
    3
    
    if !strings.HasPrefix(strings.ToLower(u), `http://`) {
        u = `http://` + u
    }
    

    This is too cumbersome to write! This is still the way to write the http protocol only.

Proper handling of paths

The part of http://example.com/path/to/file.txt that looks like /path/to/file.txt is called Path, i.e. path. Understanding and constructing paths correctly is a major area of error.

Naming issues

The first and foremost problem is naming. Many people assume that API requests can only be sent to / paths, like http://example.com/v1/posts, an API interface where the /v1/posts part is fixed and the preceding http://example.com is in the configuration file. So they name this part host (or even more host_port). At first glance I thought I could only match the example.com part (because it’s called host, hostname or host_port).

So in case I want to test a proxied API someday and the prefix changes, for example, now it’s http://example.com/proxy/v1/posts. Then the configuration file should now say http://example.com/proxy in this section. Is this still called host?

Why am I bothering with this, you ask? I didn’t want to, I didn’t even think this kind of stuff could be a dispute for us. I thought everyone was following the specs.

Where is this scenario, you ask? All over the place, like Grafana using requests to data sources in Server mode (as opposed to Direct).

So what’s a good name for it? I’ve seen: endpoint, prefix, url, address, etc.

To have or not to have the final / question

Because his code is based on configuring and then appending (yes, +) API paths, so.

  • If the configuration is http://example.com, then it will get: http://example.com/v1/posts; * If the configuration is http://example.com, then it will get
  • If you configure http://example.com/, then you get: http://example.com//v1/posts; * If you configure http://example.com/, then you get: http://example.com//v1/posts.

See? They simply can’t handle whether it ends with / or not, and at one point they even verbally asked you not to bring the final /.

The root cause of this is that their URLs are manually spliced.

1
2
prefix := `http://example.com/`
api := prefix + `/v1/posts`

Not all servers are compatible with automatically turning // into /, and errors are inevitable.

So how to do it? Use the path package. (This package conflicts with our common variable name path, which is a bit unpleasant.)

There is a corresponding package called filepath. The main difference between these two packages is that the former applies to forward-slash related paths, while the latter applies to OS related paths. For example, / separates paths on Linux and \ separates paths on Windows. This is clearly stated in the package documentation.

path

Package path implements utility routines for manipulating slash-separated paths.

The path package should only be used for paths separated by forward slashes, such as the paths in URLs. This package does not deal >with Windows paths with drive letters or backslashes; to manipulate operating system paths, use the path/filepath package.

filepath

Package filepath implements utility routines for manipulating filename paths in a way compatible with the target operating >system-defined file paths.

The filepath package uses either forward slashes or backslashes, depending on the operating system. To process paths such as URLs that always use forward slashes regardless of the operating system, see the path package.

How do I use the path package? Just one method: path.Join.

func Join(elem ...string) string

Join joins any number of path elements into a single path, separating them with slashes. Empty elements are ignored. The result is Cleaned. However, if the argument list is empty or all its elements are empty, Join returns an empty string.

Test code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
package main

import (
    "fmt"
    "log"
    "net/url"
    "path"
    "strings"
)

func api(prefix string) string {
    if !strings.Contains(prefix, `://`) {
        prefix = "http://" + prefix
    }
    u, err := url.Parse(prefix)
    if err != nil {
        log.Fatalln(err)
    }
    u.Path = path.Join(u.Path, `/v1/posts`)
    return u.String()
}

Output results.

1
2
3
4
5
6
$ go run main.go
http://example.com/v1/posts
http://example.com/v1/posts
http://example.com/v1/posts
http://example.com/proxy/v1/posts
http://example.com/proxy/v1/posts

If you are using go1.18 or later, the url library already comes with this capability, see: net/url: add JoinPath, URL.JoinPath.

Path encoding problem

The above path.Join will automatically handle encoding issues:.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package main

import (
    "fmt"
    "log"
    "net/url"
    "path"
    "strings"
)

func api(prefix string) string {
    if !strings.Contains(prefix, `://`) {
        prefix = "http://" + prefix
    }
    u, err := url.Parse(prefix)
    if err != nil {
        log.Fatalln(err)
    }
    u.Path = path.Join(u.Path, `/v1/posts/新文章`)
    return u.String()
}

func main() {
    prefix := `http://example.com`
    fmt.Println(prefix + `/v1/posts/新文章`)
    fmt.Println(api(prefix))
}

Output results.

1
2
3
$ go run main.go
http://example.com/v1/posts/新文章
http://example.com/v1/posts/%E6%96%B0%E6%96%87%E7%AB%A0

Many servers or more modern backends should now be able to handle unencoded characters correctly, and it is less common to use characters other than numeric letters in the API. The encoding problem is not particularly serious.

Do you think I /v1/posts/new-posts wrote it wrong because I didn’t encode it myself? Sorry, no mistake. The path.Join joins the paths (segments) before encoding, and the url.String() method gets the final URL for transmission.

Handling query strings correctly

A query is simply the part of the URL that comes after the question mark. For example: http://example.com/v1/posts?page_no=1&a=b, page_no=1&a=b is called a query (query or query_string).

I’m sure everyone has seen someone else manually splice in the code for this query, or written it themselves (I’m no exception).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
package main

import "fmt"

func main() {
    api := `http://example.com/v1/posts`
    api += `?`
    api += `page_no=1`
    api += `&`
    api += `a=b`
    fmt.Println(api)

    api += `&`
    api += fmt.Sprintf("text=%s", `text with spaces`)
    fmt.Println(api)

    api += `&`
    api += fmt.Sprintf(`chinese=%s`, `桃子`)
    fmt.Println(api)
}

The following are the results.

1
2
3
4
$ go run main.go
http://example.com/v1/posts?page_no=1&a=b
http://example.com/v1/posts?page_no=1&a=b&text=text with spaces
http://example.com/v1/posts?page_no=1&a=b&text=text with spaces&chinese=桃子

It’s hard to read so much hardcoding. I don’t think you’ve seen many URLs with spaces, have you? The ones with Chinese characters are probably not very standardized either, right? I’m very bitter.

The following is what I think is a more standard and safe way to write.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
package main

import (
    "fmt"
    "log"
    "net/url"
)

func main() {
    u, err := url.Parse(`http://example.com/v1/posts?existed=query`)
    if err != nil {
        log.Fatalln(err)
    }
    q := u.Query()
    q.Set(`page_no`, `1`)
    q.Set(`a`, `b`)
    q.Set(`text`, `text with spaces`)
    q.Set(`chinese`, `桃子`)
    u.RawQuery = q.Encode()
    fmt.Println(u.String())
}

Output results.

1
2
$ go run main.go
http://example.com/v1/posts?a=b&chinese=%E6%A1%83%E5%AD%90&existed=query&page_no=1&text=text+with+spaces

Finally

I don’t know if you have made similar mistakes, but I have made them many times before anyway, so I have a summary like this today.

I also don’t know if other languages have similar problems is, at least I used to write C, C++, Lua, PHP, Javascript, etc. all have similar problems.