How to implement hot reload config

Sometimes some processes must inevitably have the problem that restarting is too costly. For example, there are processes with long connections where restarting will disconnect and then all clients will need to reconnect, or processes that already have a lot of content cached in memory that will need to be rewarmed if restarted.

But then there are configurations where we want to modify the configuration of the process without restarting it.

We’ve talked about the idea of making a configuration center, and then pushing a request to the application if there are configuration changes, so that the application receives a callback to execute the configuration changes when it receives them.

This is a good solution, but it relies on a number of other components, a centralized configuration center, and an SDK dependency within the application to receive and parse configuration updates.

This article discusses ways to implement this without relying on other components, i.e. updating the file without restarting the process to make the configuration take effect.

File Watch

This is one of the more natural implementations, and the worst.

The principle is that after the process starts, it opens a thread to watch all changes to the file, and once it finds the changes, it executes a callback to update the configuration.

The worst part of this implementation is that

the API for file changes is different for different systems, e.g. inotify for Linux, kqueue for Mac/BSD. query=kqueue&sektion=2), and ReadDirectoryChangesW for Windows. In order to be compatible with these different system APIs, it is usually necessary to introduce a specialized SDK to encapsulate them, such as watchdog 2.
occupy the system thread, although this could also be made asynchronous.
occupy the system fd, which may not seem serious, but sometimes when the system fd is exhausted, we want to use hotreload to modify a parameter to control the behavior of the program, it will be a problem.

The good thing is that after importing an SDK, you don’t need to write much code beforehand.

Reload every time

This approach is to parse the configuration file every time you use a configuration item without using global variables. This way there is no more problem of hot reload, because each use is a reload, and all the cases where the configuration is used are up to date.

Many people find this implementation very silly at first glance, but if you think about it, many of the configurations that require hot reload are not highly read/write.

This saves a great deal of work:

no need to write a callback to update the configuration, just reuse the code of load
no need to introduce additional dependencies
very simple and understandable for everyone

One problem with these two approaches is that once the changes are made to the file, they take effect immediately and there is no time for you to verify that the file is correct.

SIGHUP

This is the best solution. The program registers a signal handler and reloads the config when it receives a SIGHUP signal.

The advantage of this is that

reload is the explicit intention of the operator, not just reload when the file is modified, so there is no ambiguity in the semantics
the implementation cost is small

Moreover, we can generally control the behavior of reload. For example, when using systemctl reload service, we can customize the reload command.

check if the configuration file is legal first, and if not, abort the reload.
send a reload signal to the program.

Here is an example where you can define ExecReload inside the unit file of systemd, i.e. define what is done when you execute systemctl reload serivce.

`1`	`ExecReload=/usr/local/bin/promtool check config /etc/prom-conf.yaml ; /bin/kill -HUP $MAINPID`

Here are two tips.

systemd will give you a $MAINPID for you to use, i.e. the PID of the main process, just send signals directly to it.
Use ; to define multiple commands, one will execute the next one only if it succeeds. note that this is not the shell’s ;.

This way, when we reload, if the first check fails, the return code of systemctl reload is not 0, and we know that the reload failed.

systemctl reload

At the time of operation, we can see the action error directly in Ansible’s operation results, rather than having to go through ssh to see if the execution results are correct.

Table of Contents

File Watch

Reload every time

SIGHUP