If you are concerned about distributed databases, I believe you have more or less heard of Google’s distributed database Spanner, and how Spanner uses atomic clocks to make a set of TrueTime to achieve distributed transactions across data centers.

Many people have the impression that Google is so rich that it can afford to use atomic clocks as a sophisticated high-end device. This statement cannot be said to be entirely wrong, but at least it is not completely accurate.

Network Clock Synchronization

The underlying logic of the 3 timestamp fetching methods mentioned above is very different. tiDB’s TSO is central timing, every timestamp has to be fetched from the central server; CockroachDB’s HLC is essentially a logical clock, relying on message exchange to advance the clock counter; Spanner’s TrueTime is clock synchronization, by periodically exchanging messages, the The local clock is synchronized with the source clock by exchanging messages periodically.

The clock synchronization model is similar to the way we use our watches every day. We synchronize our watches with the news feed every once in a while, and the time during that period is read out directly from the watch.

The most common clock synchronization inside the computer is NTP, synchronizing the clock through the network has a problem is the error caused by the delay.

NTP

For example, if the client initiates a query request at 12:00:00, and receives a message from the server 2 seconds later, the time returned is also 12:00:00. This does not mean that the local clock is accurate, because it takes time to send the message to the server, the local clock is actually a little faster. But exactly how much faster is impossible to know, we only know that the message took 2 seconds to come and go, but do not know how long it took to come and go respectively. So we can only approximate a middle value, dial back the time by 1 second, and then the error range is ± 1 second.

Marzullo algorithm

Synchronizing time only from a single time source is not reliable enough. Besides the possibility of failure or network outage, what is worse is that the time source itself goes wrong. The Marzullo algorithm is the algorithm used to estimate the exact time from multiple time sources.

Marzullo algorithm

As shown in the figure, we get the clock deviation and error range by querying the time from each of the four time sources ABCD. The general idea of the algorithm is to select the intervals that are covered by as many time sources as possible (narrowing the error range) and to exclude the problematic intervals (e.g., A).

However, in scenarios with strict timing requirements (e.g., distributed transactions), the Marzullo algorithm has to undergo some refinements. For example, a more obvious drawback is that when the problematic time sourceoffset interval overlaps with the normal interval, it may lead to the error range being estimated too small. If you want to know the relevant details, you can go to study the related materials, which will not be expanded here.

Clock drift

It’s not over when the time is set with the server. Usually the timing process is triggered periodically. Just as our watches become inaccurate over time, the CPU’s crystal cycle is not completely accurate, and is affected by temperature and voltage, and can “drift” over time.

Spanner assumes that the error of his server is not more than 200μs per second, according to the maximum value to calculate, 30 seconds out of sync, the error will accumulate up to 6ms. if 1 day out of sync, the maximum error reaches about 17s. note that the error range here is very, very conservative, the actual situation of the CPU is far from being so bad, for example, to compare, China’s quartz electronic watch Industry standards are, a class of monthly difference of 10-15 seconds, the second class of monthly difference of 20-30 seconds.

Atomic clock

Atomic clock, a device that uses the difference in energy levels of atoms and molecules as a reference signal to calibrate the frequency of a crystal oscillator or laser in order to make it output a standard frequency signal. It works by using the electromagnetic waves emitted by atoms when they absorb or release energy to keep time. Since this electromagnetic wave is very stable and controlled by a series of precision instruments, the timing of atomic clocks can be very accurate, reaching a level of only one second or better in 10 million years.

It does look very high-end, so how much would it cost if you wanted to buy such an atomic clock? The reality is that atomic clocks are much more affordable than they sound, and we can search for them directly on e-commerce sites.

Atomic clock

The price is about tens of thousands to more than a hundred thousand ranging, not unaffordable expensive, and a high-end point of the server is about the same price, if you reduce the accuracy requirements can be cheaper.

To put it bluntly, atomic clocks and computers everywhere above the crystal oscillator is the same kind of thing, but the precision is several orders of magnitude higher.

Timing accuracy with different hardware

Note that some people mistakenly believe that TrueTime requires each machine to be equipped with an atomic clock, in fact, there is no need, a data center has a few is completely sufficient. The specific reasons for this will be discussed later.

GPS Timing

GPS not only provides positioning services, but also timing. Each GPS satellite carries several high-precision atomic clocks and continuously broadcasts ephemeris (orbit) and time. After the ground device receives signals from at least 4 satellites, it solves a system of quadratic equations with 3-dimensional space + 1-dimensional time as variables to get the time-space information at the same time.

The accuracy of GPS is so high that the error can be controlled within a few nanoseconds. This is because the electromagnetic wave signal is basically a straight line propagation, the path of interference is very small, according to the distance can be very accurate calculation of the signal transmission delay. Network messages are affected by relay and multi-layer network packets, and even in fiber optics, the signal does not travel in a straight line.

TrueTime

With the background knowledge out of the way, let’s take a look at what TrueTime actually does.

TrueTime component deployment in the server room

The TrueTime component is divided by role into time master and time daemon. time master can be thought of as the server side of TrueTime, deployed on a number of separate machines, and time daemon is the client side, deployed as a process on each host where the service is actually running.

The time master is divided into two categories. One type installs GPS modules, which are scattered in different locations in the server room, and each GPS node uses an independent antenna to avoid failing together due to signal interference. The other category is the installation of atomic clocks, which are also multiple units to prevent failures from generating unavailability.

Various time masters periodically use the Marzullo algorithm to time match each other, and each time daemon will also time match with multiple time masters in 30-second intervals (also using the Marzullo algorithm).

As introduced before, the accuracy of GPS is nanosecond level, this error is negligible compared with the network latency in the server room, directly counted as 0ms. so that the time daemon to synchronize the clock after the error depends only on the network latency, usually no more than 1ms in the server room.

We also need to take into account the completion of the clock synchronization, to the next synchronization period time daemon clock drift, that is, the previous calculation, the maximum error within 30 seconds may accumulate to 6ms. so, time daemon on the clock error range between 1ms to 7ms constantly rise and fall, drawn out in such a jagged shape:

Time daemon error range variation diagram

So the question arises, what is the atomic clock for?

In short, the atomic clock is a backup for GPS, which is susceptible to weather and electromagnetic signal interference, and in extreme cases, the entire GPS system may fail or be shut down due to war (GPS is a military facility), which is when the atomic clock comes in handy.

It is not written in detail in the paper, but it is speculated that the logic of synchronizing the atomic clock with the GPS node should be similar to that of time daemon, i.e., the effect is also to fall back to around 1ms periodically. The difference is that when the GPS fails, the error growth rate of the atomic clock is much smaller than that of a normal machine, so it can replace it and provide clock synchronization services offline as a data source for the data center.

Summary

Back to the original question: Why is Spanner the only one that uses the TrueTime design? It’s not because the atomic clock is so far out of reach; TrueTime is an elaborate and complex set of facilities, and the atomic clock is only one of the somewhat critical components.

In the case of TiDB, we did not have the resources or the opportunity to design and build a complete server room from scratch to support this system, so we had to choose an alternative solution. But TrueTime’s design concept and ideas are worth learning and absorbing!

The biggest contribution of TrueTime is to provide the time error range as an API to the upper layer, with careful design at the transaction level to achieve demanding transaction consistency. We can neither fundamentally eliminate the error nor calculate the error range exactly, but cleverly use the determined error upper bound as a breakthrough point to solve the problem elegantly!

Here is the beauty of distributed systems: we are faced with near-desperate uncertainty, yet with faith, reason, and intelligence, we can build highly reliable, unquestionable systems on such a fragile foundation.