1. Installation

GoReplay is written in Go and has a single executable file, which can be downloaded from the official Release page and placed in the PATH directory.

1
2
3
wget https://github.com/buger/goreplay/releases/download/v1.2.0/gor_v1.2.0_x64.tar.gz
tar -zxvf gor_v1.2.0_x64.tar.gz
mv gor /usr/local/bin

2. Basic Use

The overall use of the GoReplay command line is to specify the input and input side, and then GoReplay copies the traffic from the input side to the output side.

2.1. Real-time traffic replication

The GoReplay input can specify a tcp address, and GoReplay will then copy the traffic from that port to the output; the following example shows copying traffic from 127.0.0.0:8000 and outputting it to the console.

First start an HTTP Server, here we use the python HTTP Server directly

then have gor listen to the same port, --output-stdout specifying the output as the console

At this point, accessing python’s HTTP Server via curl shows that gor copies the HTTP request and outputs it to the console

Also if we specify the output side as another HTTP Server with the -output-http option, then gor will synchronize the request and send it to the output HTTP Server.

2.2. Traffic crawling and replay

2.2.1. Basic use

GoReplay can save traffic to a file by specifying the output as a file, and then GoReplay reads that saved traffic file and redeploys it to the specified HTTP Server.

First save the request to a file with the -outpu-file option

Read the traffic information using the -input-file option and then redirect it to the target server using the -output-http option

2.2.2. Extended Options

When saving traffic to a file, by default GoReplay writes to the file in blocks and each block will generate a separate file name ( test_0.gor ), if you want to write all blocks of traffic to one file, you can set --output-file-append to true.

Also, GoReplay output file names support date placeholders, e.g. --output-file %Y%m%d.gor will generate a file name like 20210801.gor; all available date placeholders are as follows:

  • %Y : year including the century (at least 4 digits)
  • %m : month of the year (01..12)
  • %d : Day of the month (01..31)
  • %H : Hour of the day, 24-hour clock (00..23)
  • %M : Minute of the hour (00..59)
  • %S : Second of the minute (00..60)

When there are many requests, saving the traffic to a file may result in a large file, so you can use the .gz ending as the file name, and GoReplay will automatically compress it with GZip when it reads the .gz suffix.

1
gor --input-raw :8000 --output-file test.gor.gz

If you need to replay multiple files in aggregate, just specify multiple files and GoReplay will automatically keep the request order during replay:

1
gor --input-file *.gor --output-http http://127.0.0.1:8080

When using file input, GoReplay also supports stress tests, where GoReplay will replay requests at twice the rate for file names specified by test.gor|200%:

1
2
# Replay from file on 2x speed 
gor --input-file "requests.gor|200%" --output-http "staging.com"

2.3. Data Loss and Buffers

GoReplay uses a relatively low-level packet interception technique, where the kernel GoReplay intercepts a TCP packet when it arrives; however, packets can arrive out of order, and then the kernel needs to rebuild the TCP stream to ensure that upper-layer applications can read TCP packets in the correct order, at which point the kernel has a buffer of packets; by default, Linux systems have a buffer of By default, Linux systems have a buffer of 2M and Windiws have a buffer of 1M. When a particular HTTP request packet exceeds the buffer, GoReplay cannot intercept it properly (because GoReplay needs a complete HTTP request packet for saving to a file or replaying), and it may cause problems such as lost requests, corrupted requests, etc.

To solve this problem, GoReplay provides -input-raw-buffer-size option to adjust the buffer size, for example -input-raw-buffer-size 10485760 option will adjust the buffer to 10M.

2.4. Speed limit

In some cases, for debugging purposes, we may capture traffic in the production environment and mirror it to the test environment for replay; however, we may not need such a large request rate due to the high volume of traffic in the production environment, so we can let GoReplay control the number of requests for us through rate limiting.

Absolute number limit: With parameters of the form -output-http "ADDRESS|N", GoReplay guarantees that the mirrored traffic will not exceed “N” requests per second.

1
2
# staging.server will not get more than ten requests per second
gor --input-tcp :28020 --output-http "http://staging.com|10"

Percent limit restriction: With parameters of the form -output-http "ADDRESS|N%", GoReplay ensures that mirrored traffic is maintained at “N%” of the total traffic.

2.5. Request Filtering

At some point we only expect to redirect specific traffic from the production environment to the test environment, or to disallow some traffic from being redirected to the test environment, at which point we can use GoReplay’s filtering capabilities; GoReplay provides the following options to provide filtering capabilities:

  • -http-allow-header : HTTP header to allow replay (regular support)
  • -http-allow-method : HTTP methods that are allowed to be replayed
  • --http-allow-url : URL to allow replay (regular support)
  • --http-disallow-header : HTTP headers that are not allowed (regularity supported)
  • --http-disallow-url : disallowed HTTP URLs (regular support)

Here is a sample of the official command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# only forward requests being sent to the /api endpoint
gor --input-raw :8080 --output-http staging.com --http-allow-url /api

# only forward requests NOT being sent to the /api... endpoint
gor --input-raw :8080 --output-http staging.com --http-disallow-url /api

# only forward requests with an api version of 1.0x
gor --input-raw :8080 --output-http staging.com --http-allow-header api-version:^1\.0\d

# only forward requests NOT containing User-Agent header value "Replayed by Gor"
gor --input-raw :8080 --output-http staging.com --http-disallow-header "User-Agent: Replayed by Gor"

gor --input-raw :80 --output-http "http://staging.server" \
    --http-allow-method GET \
    --http-allow-method OPTIONS

2.6. Request Rewrite

Sometimes the URL path of the test environment may be completely different from the production environment, so if you replay the traffic from the production environment in the test environment directly, it may lead to the wrong request path and so on; for this reason GoReplay provides URL rewriting, parameter setting, request header setting and other functions.

URL rewriting via -http-rewrite-url option

1
2
# Rewrites all `/v1/user/<user_id>/ping` requests to `/v2/user/<user_id>/ping`
gor --input-raw :8080 --output-http staging.com --http-rewrite-url /v1/user/([^\\/]+)/ping:/v2/user/$1/ping

Set URL parameters

1
gor --input-raw :8080 --output-http staging.com --http-set-param api_key=1

Set the request header

1
2
3
gor --input-raw :80 --output-http "http://staging.server" \
    --http-header "User-Agent: Replayed by Gor" \
    --http-header "Enable-Feature-X: true"

Host header is a special request header, by default GoReplay will automatically set it to the domain name of the target replay address, if you want to turn off this default behavior use the -http-original-host option

3. Other advanced configurations

3.1. Relay server

GoReplay can use a relay server to chain traffic. To use a relay server, simply set the output side to TCP mode and the input side of the relay server to TCP mode:

1
2
3
4
5
# Run on servers where you want to catch traffic. You can run it on each `web` machine.
gor --input-raw :80 --output-tcp replay.local:28020

# Replay server (replay.local).
gor --input-tcp replay.local:28020 --output-http http://staging.com

If there are multiple relay servers, you can use the -split-output option to have each GoReplay that grabs traffic send traffic to each relay server using a polling algorithm:

1
gor --input-raw :80 --split-output --output-tcp replay1.local:28020 --output-tcp replay2.local:28020

3.2. Output to ElasticSearch

GoReplay supports setting the output side to ElasticSearch:

1
./gor --input-raw :8000 --output-http http://staging.com --output-http-elasticsearch localhost:9200/gor

There is no need to create indexes before exporting to ES, GoReplay will do it automatically and the data structure after exporting to ES is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
type ESRequestResponse struct {
	ReqURL               string `json:"Req_URL"`
	ReqMethod            string `json:"Req_Method"`
	ReqUserAgent         string `json:"Req_User-Agent"`
	ReqAcceptLanguage    string `json:"Req_Accept-Language,omitempty"`
	ReqAccept            string `json:"Req_Accept,omitempty"`
	ReqAcceptEncoding    string `json:"Req_Accept-Encoding,omitempty"`
	ReqIfModifiedSince   string `json:"Req_If-Modified-Since,omitempty"`
	ReqConnection        string `json:"Req_Connection,omitempty"`
	ReqCookies           string `json:"Req_Cookies,omitempty"`
	RespStatus           string `json:"Resp_Status"`
	RespStatusCode       string `json:"Resp_Status-Code"`
	RespProto            string `json:"Resp_Proto,omitempty"`
	RespContentLength    string `json:"Resp_Content-Length,omitempty"`
	RespContentType      string `json:"Resp_Content-Type,omitempty"`
	RespTransferEncoding string `json:"Resp_Transfer-Encoding,omitempty"`
	RespContentEncoding  string `json:"Resp_Content-Encoding,omitempty"`
	RespExpires          string `json:"Resp_Expires,omitempty"`
	RespCacheControl     string `json:"Resp_Cache-Control,omitempty"`
	RespVary             string `json:"Resp_Vary,omitempty"`
	RespSetCookie        string `json:"Resp_Set-Cookie,omitempty"`
	Rtt                  int64  `json:"RTT"`
	Timestamp            time.Time
}

3.3. Kafka Docking

In addition to exporting to ES, GoReplay also supports exporting to Kafka and reading data from Kafka:

1
2
3
gor --input-raw :8080 --output-kafka-host '192.168.0.1:9092,192.168.0.2:9092' --output-kafka-topic 'kafka-log'

gor --input-kafka-host '192.168.0.1:9092,192.168.0.2:9092' --input-kafka-topic 'kafka-log' --output-stdout

Reference https://mritd.com/2021/08/03/use-goreplay-to-record-your-live-traffic/