In complex distributed systems, it is often necessary to uniquely identify a large number of data and messages. A unique ID is needed to identify a piece of data or a message after the data is divided into libraries and tables, and the self-incrementing ID of the database obviously does not meet the demand at this time a system that can generate a globally unique ID is very necessary. To summarize, what are the requirements of the business system for ID number?

  • Global uniqueness: No duplicate ID numbers can appear.
  • Incremental trend: In MySQL InnoDB engine uses aggregated indexes, as most RDBMS use B-tree data structure to store indexed data, we should try to use ordered primary keys to ensure the write performance on top of the primary key selection.
  • Monotonic incremental: ensure that the next ID must be greater than the previous ID, such as transaction version number, IM incremental messages, sorting and other special needs.
  • Information security: If the IDs are consecutive, it is very easy to do the crawling work of malicious users, just download the specified URLs directly in order; if the order number is more dangerous, competitors can directly know the single volume of a day. So in some application scenarios, there will be a need for ID irregularity and irregularity.

Take the business-side requirements for order numbers as an example.

  • Order number cannot be duplicated
  • Order number has no rules, i.e., the coding rules cannot include any data related to the company’s operation, and external personnel cannot guess the order quantity by the order ID. It cannot be traversed.
  • Order number length is fixed and not too long
  • Easy to read, easy to communicate, no messy number-letter exchange
  • Can be generated quickly

Common global unique ID schemes.

Database self-growing ID

Advantages.

  • No coding required

Disadvantages.

  • Large tables cannot be divided horizontally, otherwise problems will occur when inserting and deleting
  • Need to add transaction mechanism for inserting data under high concurrency
  • When inserting parent and child tables (related tables) in business operations, the parent table must be inserted first, and then the child table

Redis Self-Growing IDs

Redis to generate distributed IDs is actually similar to using Mysql to self-increment IDs, which can be done atomically by using the incr command in Redis to self-increment and return, e.g.

1
2
3
4
5
6
127.0.0.1:6379> set seq_id 1     // 初始化自增ID为1
OK
127.0.0.1:6379> incr seq_id      // 增加1,并返回
(integer) 2
127.0.0.1:6379> incr seq_id      // 增加1,并返回
(integer) 3

Redis supports both RDB and AOF persistence.

  • RDB persistence is equivalent to taking a snapshot for persistence at regular intervals. If after taking a snapshot, it self-increments several times in a row before it has time to do the next snapshot persistence, Redis hangs at this time and the ID will be duplicated after restarting Redis.
  • AOF persistence is equivalent to persisting every write command. If Redis hangs, there will be no ID duplication, but it will take too long to restart and recover the data due to the incr command passing.

Advantages.

  • Ordered incremental, readable.

Disadvantages.

  • Bandwidth hungry, each time a request is made to Redis.

Timestamp + Random Number

Advantages.

  • Simple coding

Disadvantages.

  • There is a duplication problem with random numbers, even with the same timestamp. Need to check if the same value already exists before each insert into the database.

Timestamp + Member ID

Advantages.

  • No two orders exist for one user at the same time

Disadvantages.

  • Member ID will also reveal operational data, the chicken lays the egg and the egg lays the chicken

UUID/GUID

The standard type of UUID (Universally Unique Identifier) contains 32 hexadecimal digits, divided into five segments by hyphen, in the form of 36 characters of 8-4-4-4-12, example: 550e8400-e29b-41d4-a716-446655440000, so far the industry has a total of 5 There are five ways to generate UUIDs so far, see the IETF UUID specification A Universally Unique IDentifier (UUID) URN Namespace for details.

Advantages.

  • Simplicity
  • Very high performance: locally generated, no network consumption.

Disadvantages.

  • Not user friendly
  • Not easy to store: The UUID is too long, 16 bytes and 128 bits, usually represented as a 36-length string, which is not applicable in many scenarios.
  • Insecure information: The algorithm of generating UUID based on MAC address may cause MAC address leakage, a vulnerability that was used to find the location of the creator of Melissa virus.
  • There are some problems in specific environments when ID is used as primary key, for example, the scenario of being DB primary key, UUID is very inapplicable:.
    • MySQL has a clear official recommendation to keep the primary key as short as possible, and a UUID of 36 characters in length does not meet the requirement.
    • Not good for MySQL index: If used as database primary key, the unordered nature of UUID may cause frequent data location changes under InnoDB engine, which seriously affects performance.

UUID variants

The more popular UUID variant is based on the MySQL UUID variant: timestamp + machine number + random, as described in [GUID/UUID Performance Breakthrough](http://mysql.rjweb.org/doc.php/ uuid)

Advantages.

  • Lower development cost

Disadvantages.

  • Poor performance based on MySQL stored procedures

Also, with the advent of the UUID_TO_BIN(str, swap_flag) method, the the above implementations are not so applicable anymore.

Snowflake

Snowflake is a global unique id generation scheme open sourced by Twitter. The background of the Twitter-Snowflake algorithm is quite simple, in order to meet the tens of thousands of messages per second requested by Twitter, each message must be The core of the Snowflake algorithm combines a timestamp, a working machine id, and a sequence number (41 bits of millisecond time + 10 bits of machine ID + 12 bits of sequence in milliseconds).

In the above string, the first bit is unused (and can actually be used as a sign bit for long), the next 41 bits are the millisecond time, then 5 bits datacenter identification bits, 5 bits machine ID (not really an identifier, but actually for the thread identification), and then 12 bits the current count in milliseconds within that millisecond, adding up to exactly 64 bits for a Long type.

Except for the highest bit which is marked as unavailable, the remaining three bit occupancy groups can be floated, depending on the specific business requirements. By default, the 41-bit timestamp can support the algorithm until 2082, the 10-bit work machine id can support 1023 machines, and the sequence number can support 4095 self-incrementing sequence ids generated in 1 millisecond.

Strictly speaking the use of the bit segment of the working machine id can be process level, at the machine level you can use MAC address to uniquely mark the working machine, and at the working process level you can use IP+Path to distinguish the working process. If there are fewer working machines, it is a good choice to use a configuration file to set this id, if there are too many machines the maintenance of the configuration file is a disaster.

The solution here is to need a process for job id assignment, which can use a simple process written by yourself to record the assignment id.

The work process interacts with the work id allocator just once when the work process is started, and then the work process can drop the allocated id data to a file by itself and read the id in the file directly for use in the next start. The bit segment of this work machine id can also be further split, such as using the first 5 bits to mark the process id and the last 5 bits to mark the thread id or something like that.

The sequence number is a series of self-incrementing ids (atomic is recommended for multi-threads), in order to handle the need to assign ids to multiple messages in the same millisecond, and if the sequence number is used up in the same millisecond, then “wait until the next millisecond”.

Snowflake is an algorithm that generates IDs by dividing namespaces (UUIDs are also counted, and are analyzed separately because they are more common), and this scheme divides the 64-bit into multiple segments: timestamp + work number + seq number, separately to indicate the machine, time, etc.

Advantages.

  • The milliseconds are at the high level, the self-incrementing sequence is at the low level, and the entire ID is trending incrementally.
  • It does not depend on third-party systems such as databases, and is deployed as a service, which is more stable and generates IDs with very high performance.
  • It is very flexible as you can assign bit bits according to your business characteristics.

Disadvantages.

  • Strong dependence on machine clock, if the clock on the machine dials back, it will lead to duplicate issuing numbers or the service will be in unavailable state.
  • Need to introduce zookeeper and independent snowflake dedicated server

The specific implementation has the following options in addition to the official Scala version.

Snowflake variants

snowflake’s algorithm is still good, in fact, itself is not complicated, the complexity is how your client uses. The problems encountered are as follows.

  • The code is deployed on different servers, how to set the machine ID in the middle, is there a more convenient way to get the machine ID?
  • The whole algorithm relies on time continuity, but the display environment is that the online servers are all enabled with ntp, and the ntp case will have the problem of time regression.

Snowflake generated unique ID process.

  • 10 bits of machine number, obtained from a Zookeeper cluster when the ID-assigning worker starts (to ensure that all workers do not have duplicate machine numbers)
  • 41 bits of Timestamp: Each time a new ID is generated, the current Timestamp is fetched, and the sequence number is generated in two cases:
  • If the current Timestamp is the same as the Timestamp of the previous generated ID (in the same millisecond), the sequence number of the previous ID + 1 is used as the new sequence number (12 bits); if all the IDs in this millisecond are used up, wait until the next millisecond to continue (during this waiting process, no new IDs can be (during this wait, no new IDs can be assigned)
  • If the current Timestamp is larger than the previous ID’s Timestamp, a random initial sequence number (12 bits) is generated as the first sequence number in this millisecond

The whole process is decentralized, except for the external dependency when the worker is started (it needs to get the worker number from Zookeeper), and then it can work independently.

Exceptions discussion:

  • When fetching the current timestamp, what if the timestamp fetched is smaller than the timestamp of the previous generated ID? Snowflake’s approach is to continue fetching the current machine’s time until a larger Timestamp is fetched (while waiting, no new IDs can be assigned).

As you can see from this exception, if there is a large deviation in the clocks of the machines Snowflake is running on, the whole Snowflake system will not work properly (the more deviations, the longer the wait time to assign a new ID). As you can see from Snowflake’s official documentation (https://github.com/twitter/snowflake/#system-clock-dependency) it explicitly requires that “You should use NTP to keep your system clock accurate”. And it is better to configure NTP in a mode that does not adjust backwards. In other words, when NTP corrects the time, it will not dial back the machine clock.

Solving time synchronization problems

Options that can be taken to solve the above mentioned timing problems.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
import java.security.SecureRandom;
 
/**
 * 自定义 ID 生成器
 * ID 生成规则: ID长达 64 bits
 * 
 * | 41 bits: Timestamp (毫秒) | 3 bits: 区域(机房) | 10 bits: 机器编号 | 10 bits: 序列号 |
 */
public class CustomUUID {
    // 基准时间
    private long twepoch = 1288834974657L; //Thu, 04 Nov 2010 01:42:54 GMT
    // 区域标志位数
    private final static long regionIdBits = 3L;
    // 机器标识位数
    private final static long workerIdBits = 10L;
    // 序列号识位数
    private final static long sequenceBits = 10L;
 
    // 区域标志ID最大值
    private final static long maxRegionId = -1L ^ (-1L << regionIdBits);
    // 机器ID最大值
    private final static long maxWorkerId = -1L ^ (-1L << workerIdBits);
    // 序列号ID最大值
    private final static long sequenceMask = -1L ^ (-1L << sequenceBits);
 
    // 机器ID偏左移10位
    private final static long workerIdShift = sequenceBits;
    // 业务ID偏左移20位
    private final static long regionIdShift = sequenceBits + workerIdBits;
    // 时间毫秒左移23位
    private final static long timestampLeftShift = sequenceBits + workerIdBits + regionIdBits;
 
    private static long lastTimestamp = -1L;
 
    private long sequence = 0L;
    private final long workerId;
    private final long regionId;
 
    public CustomUUID(long workerId, long regionId) {
 
        // 如果超出范围就抛出异常
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException("worker Id can't be greater than %d or less than 0");
        }
        if (regionId > maxRegionId || regionId < 0) {
            throw new IllegalArgumentException("datacenter Id can't be greater than %d or less than 0");
        }
 
        this.workerId = workerId;
        this.regionId = regionId;
    }
 
    public CustomUUID(long workerId) {
        // 如果超出范围就抛出异常
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException("worker Id can't be greater than %d or less than 0");
        }
        this.workerId = workerId;
        this.regionId = 0;
    }
 
    public long generate() {
        return this.nextId(false, 0);
    }
 
    /**
     * 实际产生代码的
     *
     * @param isPadding
     * @param busId
     * @return
     */
    private synchronized long nextId(boolean isPadding, long busId) {
 
        long timestamp = timeGen();
        long paddingnum = regionId;
 
        if (isPadding) {
            paddingnum = busId;
        }
 
        if (timestamp < lastTimestamp) {
            try {
                throw new Exception("Clock moved backwards.  Refusing to generate id for " + (lastTimestamp - timestamp) + " milliseconds");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
 
        //如果上次生成时间和当前时间相同,在同一毫秒内
        if (lastTimestamp == timestamp) {
            //sequence自增,因为sequence只有10bit,所以和sequenceMask相与一下,去掉高位
            sequence = (sequence + 1) & sequenceMask;
            //判断是否溢出,也就是每毫秒内超过1024,当为1024时,与sequenceMask相与,sequence就等于0
            if (sequence == 0) {
                //自旋等待到下一毫秒
                timestamp = tailNextMillis(lastTimestamp);
            }
        } else {
            // 如果和上次生成时间不同,重置sequence,就是下一毫秒开始,sequence计数重新从0开始累加,
            // 为了保证尾数随机性更大一些,最后一位设置一个随机数
            sequence = new SecureRandom().nextInt(10);
        }
 
        lastTimestamp = timestamp;
 
        return ((timestamp - twepoch) << timestampLeftShift) | (paddingnum << regionIdShift) | (workerId << workerIdShift) | sequence;
    }
 
    // 防止产生的时间比之前的时间还要小(由于NTP回拨等问题),保持增量的趋势.
    private long tailNextMillis(final long lastTimestamp) {
        long timestamp = this.timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = this.timeGen();
        }
        return timestamp;
    }
 
    // 获取当前的时间戳
    protected long timeGen() {
        return System.currentTimeMillis();
    }
}

Using this method requires attention to.

  • In order to keep the growing trend, avoid the time of some servers early and some servers late, you need to control the time of all servers, and avoid the NTP time server to call back the time of the server.
  • The sequence number is always returned to 0 when spanning milliseconds, which will make more IDs with sequence number 0, resulting in uneven generated IDs after modulo, so the sequence number is not returned to 0 every time, but to a random number from 0 to 9.
  • It is better to use this CustomUUID class in a system that can keep the single instance mode running.

Solving distributed deployments

There are some variants of Snowflake, and individual applications have made some changes to Snowflake to suit their actual scenarios.

Boundary flake

Changes:

  • ID length extended to 128 bits
  • Up to 64 bits timestamp
  • Then a 48 bits Worker number (as long as the Mac address)
  • Finally, a 16 bits Seq Number
  • Since it uses 48 bits as the Worker ID, the same length as the Mac address, there is no need to communicate with Zookeeper to get the Worker ID at startup.
  • It is based on Erlang.

The purpose of this is to achieve a smaller conflict probability with more bits, so that more workers can work at the same time. It can also allocate more IDs per millisecond.

Simpleflake

The idea of Simpleflake is to eliminate the worker number, keep the 41 bits timestamp, and extend the sequence number to 22 bits.

Features of Simpleflake.

  • sequence number is generated completely at random (this also leads to duplicate IDs)
  • There is no worker number, so there is no need to communicate with Zookeeper, which makes it completely decentralized.
  • Timestamp remains the same as Snowflake, so you can seamlessly upgrade to Snowflake in the future

The problem with Simpleflake is that the sequence number is generated completely randomly, leading to the possibility of duplicate generated IDs. The probability of duplicate generated IDs grows with the number of IDs generated per second.

So the limitation of Simpleflake is that it cannot generate too many IDs per second (preferably less than 100 times/sec, if the scenario is greater than 100 times/sec, Simpleflake is not applicable and it is recommended to switch back to Snowflake).

MongoDB ObjectID

MongoDB official documentation ObjectID can be regarded as similar to snowflake method, through “time + machine code + pid + inc” a total of 12 bytes, through the way 4 + 3 + 2 + 3 finally identified into a 24-length hexadecimal characters. Compared to snowflake length and readability is less.

Baidu uid-generator

uid-generator uses snowflake, but differs in the production of the machine id, also called workId. uid-generator’s workId is automatically generated by uid-generator and takes into account that the application is deployed on docker, in uid-generator generator, the default policy is: the workId is assigned by the database when the application is started. To put it simply, the application will insert a piece of data into the database table (uid-generator needs to add a new WORKER_NODE table) at startup, and the unique id corresponding to the data returned after successful insertion is the workId of the machine, and the data consists of host and port. For the workId in the uid-generator, it occupies 22 bits, the time occupies 28 bits, and the serialization occupies 13 bits. It should be noted that, unlike the original snowflake, the unit of time is seconds, not milliseconds, and the workId is different, as the same application consumes one workId per restart. workId every time the same application restarts.

Meituan Leaf

Leaf by Meituan is also a distributed ID generation framework. It is very comprehensive, i.e., it supports number segment mode as well as snowflake mode.

Number mode can be understood as a bulk acquisition, such as the DistributIdService from the database to get IDs, if you can get multiple IDs in bulk and cached locally, then it will greatly provide the efficiency of the business application to get IDs. For example, each time the DistributionIdService obtains IDs from the database, it obtains a number range, such as (1,1000], which represents 1000 IDs, and when the business application requests the DistributionIdService to provide IDs, the DistributionIdService only needs to increment itself locally from 1 and return Instead of requesting the database every time, the business application will only need to go to the database to get the next ID when the local self-increment reaches 1000, i.e., when the current ID has been used up.

The difference between Leaf’s snowflake pattern and the original snowflake algorithm is mainly in the generation of workId, which is based on ZooKeeper’s sequential Id.

Flickr’s database is self-increasing

Flickr is using something called Ticket Servers. It is implemented using pure MySQL.

1
2
3
4
5
6
CREATE TABLE `Tickets64` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `stub` char(1) NOT NULL default '',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM

Insert a record first, then use replace to get the id

1
2
REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

Advantages.

  • Very simple, using the existing database system functions to achieve, small cost, with DBA professional maintenance.
  • ID number is monotonically self-increasing, which can realize some business with special requirements for ID.

Disadvantages.

  • Strong dependency on DB, when DB is abnormal the whole system is unavailable, which is a fatal problem. Configuring master-slave replication can increase the availability as much as possible, but data consistency is difficult to guarantee in special cases. Inconsistency in master-slave switchover may lead to duplicate issuance.
  • The ID issuance performance bottleneck is limited to the read and write performance of a single MySQL.

For MySQL performance problem, the following solution can be used: In a distributed system we can deploy several machines, and each machine sets different initial values with equal step size and number of machines. For example, there are two machines. The initial value of TicketServer1 is 1 (1, 3, 5, 7, 9, 11…) and the initial value of TicketServer2 is 2 (2, 4, 6, 8, 10…).

Instagram Stored Procedures

The Instagram ID generation is simply described as 41b ts + 13b shard id + 10b increment seq.

  • 41 bits: Timestamp (milliseconds)
  • 13 bits: code number of each logic shard (maximum 8 * 1024 logic shards supported)
  • 10 bits: sequence number, each Shard can generate up to 1024 IDs per millisecond

The implementation is as follows.

  • first divide each Table into logical shards, the number of logical shards can be very large, for example, 2000 logical shards
  • Then make a rule that specifies which database instance each logical shard is stored in; there need not be many database instances, for example, for a system with 2 PostgreSQL instances (instagram using PostgreSQL) you can use the rule that the odd number of logical shards are stored in the first database instance and the even number of logical shards are stored in the second database instance
  • Specify one field per Table as the slice field (e.g. for user tables you can specify uid as the slice field)
  • When inserting a new data, first decide which logical shard the data is assigned to based on the value of the shard field.
  • Then, based on the correspondence between the logic shard and the PostgreSQL instance, determine which PostgreSQL instance the data should be stored on

The 41 bits Timestamp is similar to Snowflake when generating unique IDs, and this cry mainly describes how to generate the 13 bits logic Shard code and the 10 bits sequence number.

logic Shard code.

  • Suppose a new user record is inserted, and when inserting, the uid is used to determine which logic shard the record should be inserted into.
  • Assume that the current record to be inserted will be inserted into logic shard number 1341 (assuming that there are 2000 logic shards in this Table)
  • The 13 bits of the newly generated ID will be filled with the number 1341

The sequence number is generated using the auto-increment sequence on each PostgreSQL Table.

  • If there are already 5000 rows on the current table, then the next auto-increment sequence for this table is 5001 (you can get it directly by calling the method provided by PL/PGSQL)
  • Then this 5001 is modulo 1024 to get a sequence number of 10 bits

The advantage of the Instagram solution is that

  • Using the logic Shard number to replace the Worker number used by Snowflake, there is no need to go to the central node to get the Worker number, so it is completely decentralized.
  • You can directly know which logic shard the record is stored on by its ID
  • When you do data migration in the future, you can also do data migration by logic shard, which will not affect the future data migration.

For more details, see: Sharding & IDs at Instagram

Advantages.

  • Low development costs

Disadvantages.

  • PostgreSQL-based stored procedures, poor generality