User-Agent uses a specific string to identify web client information, and User-Agent is often used by websites for platform judgment, crawler detection, etc. This article introduces some general methods to determine whether a User-Agent is forged or not.

1. Format of User-Agent

User-Agent is a standard Header field when a web client initiates an HTTP request, which uses a specific string to identify web client information. The User-Agent is so important that it is often used by websites for platform determination, crawler detection, and so on.

The vast majority of web browsers use the following format for the User-Agent value:

1
Mozilla/[version] ([system and browser information]) [platform] ([platform details]) [extensions]

The following are the User-Agent values for a typical FireFox browser:

1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0

Among them:

  • Mozilla/5.0

    Mozilla/5.0 is a generic prefix indicating compatibility with Mozilla, which is almost standard for modern web browsers. There is a complex historical origin involved.

  • Macintosh; Intel Mac OS X 10.15; rv:109.0

    Identifies information about the operating system running the browser.

  • Gecko/20100101

    Information about the rendering engine used by the browser.

  • Firefox/113.0

    Browser specific information (version).

For complex historical evolution reasons, the User-Agent of mainstream browsers is often a variant of the above format. Here are the User-Agent values of different browsers I actually tested:

  • macOS Chrome

    1
    
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36
    
  • macOS Safari

    1
    
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15
    
  • macOS Microsoft Edge

    1
    
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.50
    
  • iPhone Safari

    1
    
    Mozilla/5.0 (iPhone; CPU iPhone OS 15_7_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.4 Mobile/15E148 Safari/604.1
    
  • Android Chrome

    1
    
    Mozilla/5.0 (Linux; Android 7.1.2; Pixel) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.71 Mobile Safari/537.36
    

2. determine whether the client User-Agent is forged

The User-Agent information is very important, but because it is an easy-to-read string in plain text, it is easy to be tampered with. For a well-designed web crawler, it can forge the User-Agent of any system and any browser version to simulate normal user access requests; a large number of crawler requests will cause greater pressure on the system services and it is easy to crawl a large amount of sensitive data, to solve this problem, in addition to restrictions on access frequency and abnormal behavior, you can also determine whether the User-Agent has been falsified to distinguish between requests. Agent is tampered with or not to distinguish the requests from malicious ones.

The conventional way of judgment is to collect more information in order to compare with the User-Agent value, and when there is a mismatch, it is regarded as forgery.

TCP/IP fingerprinting

HTTP uses TCP/IP as the underlying transport protocol. Considering the differences in TCP implementation across operating systems and operating system versions, information such as TCP headers and flags can be used to determine from a specific operating system. If the TCP fingerprint information detects that it belongs to a Linux system, the User-Agent of the request is unlikely to be a Windows system.

p0f is a probing tool based on analyzing TCP/IP protocol information. It can identify the operating system used by a host by capturing and analyzing the packets sent by the target host. A typical example of sniffing result is shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
.-[ 1.2.3.4/1524 -> 4.3.2.1/80 (syn) ]-
|
| client   = 1.2.3.4
| os       = Windows XP
| dist     = 8
| params   = none
| raw_sig  = 4:120+8:0:1452:65535,0:mss,nop,nop,sok:df,id+:0
|
`----

.-[ 1.2.3.4/1524 -> 4.3.2.1/80 (mtu) ]-
|
| client   = 1.2.3.4
| link     = DSL
| raw_mtu  = 1492
|
`----

.-[ 1.2.3.4/1524 -> 4.3.2.1/80 (uptime) ]-
|
| client   = 1.2.3.4
| uptime   = 0 days 11 hrs 16 min (modulo 198 days)
| raw_freq = 250.00 Hz
|
|
`----

.-[ 1.2.3.4/1524 -> 4.3.2.1/80 (http request) ]-
|
| client   = 1.2.3.4/1524
| app      = Firefox 5.x or newer
| lang     = English
| params   = none
| raw_sig  = 1:Host,User-Agent,Accept=[text/html,application/xhtml+xml...
|
`----

HTTP information

Besides User-Agent, there are many fields in the Header of HTTP request. Below is the complete header information when requesting the home page of this blog using FireFox browser.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
GET / HTTP/2
Host: www.sobyte.net
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
Accept-Encoding: gzip, deflate, br
Referer: https://www.sobyte.net/
Connection: keep-alive
Cookie: ...
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
TE: trailers

If you launch the same request in Chrome, you will see that there are several more header.

1
2
3
Sec-Ch-Ua: "Google Chrome";v="113", "Chromium";v="113", "Not-A.Brand";v="24"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "macOS"

Not only that, its Accept, Accept-Language field content and FireFox also have differences.

1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7

Also, the order of the header of some requests is different between the two. In Firefox, the User-Agent is at the top, while in Chrome the User-Agent is at the bottom.

JavaScript features

window.navigator property

The window.navigator structure can be distinguished by properties such as vendor, platform, oscpu, etc., in addition to the userAgent property.

  • Chorme

    1
    2
    3
    
    window.navigator.vendor -> Google Inc.
    window.navigator.platform -> MacIntel
    window.navigator.oscpu ->
    
  • FireFox

    1
    2
    3
    
    window.navigator.vendor ->
    window.navigator.platform -> MacIntel
    window.navigator.oscpu -> Intel Mac OS X 10.15
    
  • Safari

    1
    2
    3
    
    window.navigator.vendor: Apple Computer, Inc.
    window.navigator.platform: MacIntel
    window.navigator.oscpu: undefined
    

CSS Features

Browsers usually have their own non-standard CSS features. We can use the JS method CSS.supports to check if a given CSS feature is supported. For example:

  • Chrome

    1
    2
    3
    
    CSS.supports("-webkit-border-vertical-spacing", 0) -> true
    CSS.supports("-moz-user-focus", "normal") -> false
    CSS.supports("-moz-box-sizing", "content-box") -> false
    
  • Firefox

    1
    2
    3
    
    CSS.supports("-webkit-border-vertical-spacing", 0) -> false
    CSS.supports("-moz-user-focus", "normal") -> true
    CSS.supports("-moz-box-sizing", "content-box") -> true
    

Special window properties

Different browsers may add different window properties.

  • Chrome

    1
    2
    3
    
    window.webkitCancelAnimationFrame !== undefined -> true
    window.mozInnerScreenX !== undefined -> false
    window.chrome !== undefined -> true
    
  • Firefox

    1
    2
    3
    
    window.webkitCancelAnimationFrame !== undefined -> false
    window.mozInnerScreenX !== undefined -> true
    window.chrome !== undefined -> false
    

Limitations of traditional inspection methods

The above check method can be compared with the User-Agent information to determine whether it is forged or not. But at the same time, we can find a problem that the information that can be used to differentiate is not fundamentally different from User-Agent, it is a clear plaintext string, and an attacker can also modify it arbitrarily to forge a request identical to a normal user. In other words, even if we can find enough distinguished information, an attacker can easily modify the value of this information as if it were a User-Agent.