To speed up CI execution, caching is a very effective tool. Ensuring the highest utilization of the cache is the most important concern when using caching. For example, after caching the entire target directory, when do you update the cache? The best way to do this is when there is a dependency change, which is Cargo.lock for Rust and package.lock for Node.

Let’s see how to use the cache component to achieve the above effect, with three main parameters.

  • key the cache ID, which can be seen as a KV pair for the entire cache space
  • path which is the path to cache
  • restore-keys specifies which cache to select when the key does not hit

The following is an example of caching a Rust project.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
jobs:
  test:
    timeout-minutes: 20
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@master
      - name: Cache Crates
        uses: actions/cache@v3
        with:
          path: |
            ./target
            ~/.cargo                        
          key: debug-${{ runner.os }}-${{ hashFiles('rust-toolchain.toml') }}-${{ hashFiles('Cargo.lock') }}
          restore-keys: |
            debug-${{ runner.os }}-${{ hashFiles('rust-toolchain.toml') }}-
            debug-${{ runner.os }}-                        
      - run: cargo test

First look at the definition of key, which is divided into four parts, namely

  1. fixed value, debug to distinguish it from release compilation
  2. variables, operating system
  3. toolchain file Hash
  4. cargo.lock file Hash

For a cache, when using a key hit, called a cache hit, there is no need to update the cache at the end of the Actions, so when designing the key, two things need to be kept in mind:

  1. the key should be able to represent cache changes, as the four variables above can determine a complete valid cache
  2. there may be more than one field that can represent cache changes, and the ones that change frequently need to be placed last, and the four variables above follow this order

The second design point needs to be seen in conjunction with restore-key. If the key is not inconsistent when Actions is executed, this means that the cache contents have changed, either in the lock file or in the toolchain, and by the general design of the cache, it is not possible to use the cache if the key is inconsistent.

However, the cache can still be used in this case. For example, if there are 10 dependencies in the project, and only one of them has been updated, the cache is still valid for the remaining 9. In this case, the cache selection is done with restore-key.

restore-key Specifies a series of candidate cache keys to be used as an alternate cache in case there are no hit keys.

Since the alternate cache is top-down, the length of restore-key is generally decreasing in order. For the example above, the cache selection is as follows.

  • If only Cargo.lock has changed, then use the cache pointed to by the first restore-key, as it has the highest cache efficiency
  • If both Cargo.lock and toolchain files have changed, then use the second one for the same reason
  • If you are currently compiling a release, you can’t use the cache regardless of changes to Cargo.lock and toolchain files, so as to avoid confusion with the debug cache, which can lead to an oversized cache

As long as the key is not hit, the cache will be updated for the next direct hit after the execution of Actions. As you can see, the clever design of restore-key ensures that the most valid cache is always the “hot” one.

Caveats

For security and cost reasons, GitHub places the following restrictions on caching.

  • If the cache is generated on the main branch, then all other branches derived from the main branch can use it; however, the cache generated by branches B1 and B2, which are also derived from the main, cannot be shared
  • Caches that have not been accessed for 7 days will be deleted automatically.
  • The cache space is only 10G, more than that will be eliminated according to the access time LRU