Recently, the project needs to implement version control of single data, so there is a table (foo) dedicated to storing key data and another table (bar) dedicated to storing Data data, how to get the latest version of all the key data in the bar table? Let’s take a look at the schema example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
-- foo table
DROP TABLE IF EXISTS "foo";
CREATE TABLE `foo` (
  `id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  `name` TEXT NULL,
  `key` TEXT NULL,
  `created_at` DATETIME NULL,
  `updated_at` DATETIME NULL
);

-- bar table
CREATE TABLE `bar` (
  `id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
  `foo_id` INTEGER NULL, 
  `is_deleted` INTEGER NULL, 
  `timestamp` INTEGER NULL, 
  `created_at` DATETIME NULL, 
  `data` TEXT NULL, 
  `memo` TEXT NULL
)

The name + key in the foo table is a unique value, so it will be a one-to-many state, and a key will correspond to multiple sets of data in the bar. The timestamp in the bar is used to handle version control, and each modification will create a new set of timestamp data. The following section describes how to get the first few data for each key.

Using the UNION method

First of all, if you don’t have much data, you can use UNION to solve the problem as follows:

1
2
3
4
5
6
7
8
9
select * from bar where foo_id=1 order by timestamp desc limit 3
UNION
select * from bar where foo_id=2 order by timestamp desc limit 3
UNION
select * from bar where foo_id=3 order by timestamp desc limit 3
.
.
.
select * from bar where foo_id=n order by timestamp desc limit 3

This approach is not expected to solve the version control problem, assuming that the same foo_id data in each timestamp version of the number of strokes is not the same, so it will be sprayed wrong.

foo_id timestamp data
1 100 test_01
1 100 test_02
1 100 test_03
1 101 test_01
1 101 test_02
1 101 test_03
1 101 test_04

If we only use limit method, we can’t get the data with timestamp 101 (because there are four strokes and we can only get 3 strokes by limit). So this solution is not suitable at all.

Using the rank() method

The rank() method can be supported by MySQL, SQLite or Postgres, and since my development mode is to use SQLite natively and Postgres for the production environment, I will take care of whether all three can work together while writing SQL (XD for open source projects), so let’s experiment with rank to The timeestamp is marked by rank.

1
2
3
SELECT bar.*, 
  rank() OVER (PARTITION BY foo_id ORDER BY "timestamp" DESC) as rank
  FROM bar

You will get the following data.

foo_id timestamp data rank
1 101 test_01 1
1 101 test_02 1
1 101 test_03 1
1 101 test_04 1
1 100 test_01 2
1 100 test_02 2
1 100 test_03 2

This time we need to get the data when foo_id is 1, so we can solve the limit problem by using rank = 1. Next, we need to handle how to get the latest version (timestamp) of each foo_id. Assume the following data:

foo_id timestamp data
1 100 1_test_01
1 101 1_test_01
1 101 1_test_02
2 100 2_test_01
2 101 2_test_02
2 102 2_test_03
3 100 3_test_01
3 103 3_test_02
3 104 3_test_03
3 105 3_test_04

We need to get the latest version

  • version 101 if foo_id is 1
  • version 102 if foo_id is 2
  • version 105 when foo_id is 3
1
2
3
4
5
select bar.* from 
(SELECT bar.*, 
  rank() OVER (PARTITION BY foo_id ORDER BY "timestamp" DESC) as rank
  FROM bar) bar
  where rank = 1

The data are as follows.

foo_id timestamp data rank
1 101 1_test_01 1
1 101 1_test_02 1
2 102 2_test_03 1
3 105 3_test_04 1

By using rank = 1, we can get the latest version of each foo. Next, suppose we want to get the version with timestamp 102, then we need to find the version of each foo that is closest to 102.

  • version 101 when foo_id is 1
  • version 102 when foo_id is 2
  • version 100 when foo_id is 3 (100 is closest to 102)
1
2
3
4
5
select bar.* from 
(SELECT bar.*, 
  rank() OVER (PARTITION BY foo_id ORDER BY "timestamp" DESC) as rank
  FROM bar where "timestamp" <= 102) bar
  where rank = 1

The data are as follows.

foo_id timestamp data rank
1 101 1_test_01 1
1 101 1_test_02 1
2 102 2_test_03 1
3 100 3_test_01 1

This is how to solve the data version control problem by rank().