Usage of Guava Cache

Preface

“Caching” has always been one of the most talked about technologies among programmers, such as Redis, Encache, and Guava Cache. It’s important to acknowledge that Redis distributed caching is the most popular caching technology today, both in terms of interviewing and frequency of use, but from my personal project experience, local caching is also a very common technology.

There are many articles analyzing Redis caching, such as Redis avalanche, Redis expiration mechanism, and so on. However, there are very few articles that analyze local caching in my image.

In a recent project, a new colleague used Guava Cache to cache the responses of an RPC interface, and when I reviewed his code, I happened to find a not-so-sensible way of writing it, hence this article.

This article will introduce some common operations of Guava Cache: basic API usage, expiration policy, and refresh policy. As is my habit, I will also include some summaries from actual development. It is important to note that I have not read the source code of Guava Cache, so I will not go into too much depth about it, as it is just some usage experience or best practices.

First, let’s briefly introduce Guava Cache, which is a memory caching module in guava, the basic toolkit packaged by Google, that provides the following capabilities.

Encapsulates the flow of cache-data source interaction, making development more focused on business operations
Provides thread-safe access operations (analogous to ConcurrentHashMap)
Provides common cache expiration policies and cache refresh policies
Provides cache hit rate monitoring

Basic Usage

Introduces basic usage of Guava Cache using an example - caching case-converted return values.

private String fetchValueFromServer(String key) {
    return key.toUpperCase();
}

@Test
public void whenCacheMiss_thenFetchValueFromServer() throws ExecutionException {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) {
                return fetchValueFromServer(key);
            }
        });

    assertEquals(0, cache.size());
    assertEquals("HELLO", cache.getUnchecked("hello"));
    assertEquals("HELLO", cache.get("hello"));
    assertEquals(1, cache.size());
}

The benefits of using Guava Cache are already on paper, as it decouples cache access from business operations. The load method of the CacheLoader can be understood as the entry point for loading raw data from a data source, and when the getUnchecked or get method of the LoadingCache is called, the Guava Cache behaves as follows.

When the cache is not hit, the load interface is called synchronously to load into the cache and return the cache value
If the cache is hit, the cache value is returned directly.
When the multithreaded cache is not hit, thread A will block thread B’s request when it loads until the cache is loaded

Note that Guava provides two getUnchecked or get load methods, there is no big difference, no matter which one you use, you need to pay attention to the data source whether it is the return value of the RPC interface or the database, you should consider the access timeout or failure, and do a good job of handling exceptions.

Preload Cache

Common usage scenarios for preload caching.

“Product promotion” scenario, cache preheating, adding hot products to the cache.
Preload the cache after a system restart to avoid real requests from breaking through the cache.

Guava Cache provides put and putAll methods.

@Test
public void whenPreloadCache_thenPut() {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) {
                return fetchValueFromServer(key);
            }
        });

    String key = "kirito";
    cache.put(key,fetchValueFromServer(key));

    assertEquals(1, cache.size());
}

The operation is exactly the same as HashMap.

Here’s a misconception that my new colleague just happened to step on, and that’s why I wrote this article in the first place. Make sure you only use put in the preload cache scenario, and use load to trigger the load cache in any other scenario. See the following contrarian example.

// 注意这是一个反面示例
@Test
public void wrong_usage_whenCacheMiss_thenPut() throws ExecutionException {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) {
                return "";
            }
        });

    String key = "kirito";
    String cacheValue = cache.get(key);
    if ("".equals(cacheValue)) {
        cacheValue = fetchValueFromServer(key);
        cache.put(key, cacheValue);
    }
    cache.put(key, cacheValue);

    assertEquals(1, cache.size());
}

This way, a null value is set in the load method, and the cache is subsequently used by manual put + get, which is more like operating a HashMap, but is not recommended for use in a Cache. As described earlier get and load are thread-safe by the Guava Cache, ensuring that when multiple threads access the cache, the first request loads the cache while blocking subsequent requests, which is not an elegant way to use a HashMap, and in extreme cases can lead to cache-penetration and thread-safety problems.

Be sure to use the put method only as a preload cache scenario.

Cache expiration

The first difference between Cache and ConcurrentHashMap can be seen in the “cache expiration” scenario. This section introduces some common cache expiration behaviors and policies in Guava.

Cache a fixed number of values

@Test
public void whenReachMaxSize_thenEviction() throws ExecutionException {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().maximumSize(3).build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) {
                return fetchValueFromServer(key);
            }
        });

    cache.get("one");
    cache.get("two");
    cache.get("three");
    cache.get("four");
    assertEquals(3, cache.size());
    assertNull(cache.getIfPresent("one"));
    assertEquals("FOUR", cache.getIfPresent("four"));
}

One of the biggest problems with using ConcurrentHashMap for caching is that we don’t have an easy and effective way to stop it from growing indefinitely, and Guava Cache can configure maximumSize by initializing the LoadingCache to ensure that cached content doesn’t cause OOM on your system.

It is worth noting that my test case here uses a third way to get the cache besides get and getUnchecked, and as literally described, getIfPresent does not trigger the load method to load the data source when the cache does not exist.

LRU expiration policy

Using the same example as above, when we set the capacity to 3, we only learn that the LoadingCache can store 3 values, but we do not know which old value needs to be eliminated to make room for the new value after the 4th value is deposited. In fact, Guava Cache adopts the LRU cache elimination policy by default, Least Recently Used, an algorithm you may not have implemented but will have heard of, and the semantics of Used in Guava Cache stands for any one access, such as put, get. Continue with the following example.

@Test
public void whenReachMaxSize_thenEviction() throws ExecutionException {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().maximumSize(3).build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) {
                return fetchValueFromServer(key);
            }
        });

    cache.get("one");
    cache.get("two");
    cache.get("three");
    // access one
    cache.get("one");
    cache.get("four");
    assertEquals(3, cache.size());
    assertNull(cache.getIfPresent("two"));
    assertEquals("ONE", cache.getIfPresent("one"));
}

Note the difference between this example and the previous one: after the fourth get access to one, two becomes the longest unused value, and when the fourth value of four is stored, the eliminated object becomes two and not one anymore.

Cache fixed time

Setting an expiration time for the cache is also an important feature that distinguishes HashMap from Cache. Guava Cache provides the expireAfterAccess and expireAfterWrite schemes to set the expiration time for cache values in LoadingCache.

@Test
public void whenEntryIdle_thenEviction()
    throws InterruptedException, ExecutionException {

    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().expireAfterAccess(1, TimeUnit.SECONDS).build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) {
                return fetchValueFromServer(key);
            }
        });

    cache.get("kirito");
    assertEquals(1, cache.size());

    cache.get("kirito");
    Thread.sleep(2000);

    assertNull(cache.getIfPresent("kirito"));
}

Cache Invalidation

@Test
public void whenInvalidate_thenGetNull() throws ExecutionException {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder()
            .build(new CacheLoader<String, String>() {
                @Override
                public String load(String key) {
                    return fetchValueFromServer(key);
                }
            });

    String name = cache.get("kirito");
    assertEquals("KIRITO", name);

    cache.invalidate("kirito");
    assertNull(cache.getIfPresent("kirito"));
}

Use void invalidate(Object key) to remove a single cache and void invalidateAll() to remove all caches.

Cache Refresh

Cache refresh is commonly used to overwrite old cache values with new values from the data source. Guava Cache provides two types of refresh mechanisms: manual refresh and scheduled refresh.

Manual refresh

`1`	`cache.refresh("kirito");`

The refresh method will trigger the load logic to try to load the cache from the data source.

Note that the refresh method does not block the get method, so the old cache values will still be accessible during the refresh period until the load is complete, see the example below.

@Test
public void whenCacheRefresh_thenLoad()
    throws InterruptedException, ExecutionException {

    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.SECONDS).build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) throws InterruptedException {
                Thread.sleep(2000);
                return key + ThreadLocalRandom.current().nextInt(100);
            }
        });

    String oldValue = cache.get("kirito");

    new Thread(() -> {
        cache.refresh("kirito");
    }).start();

    // make sure another refresh thread is scheduling
    Thread.sleep(500);

    String val1 = cache.get("kirito");

    assertEquals(oldValue, val1);

    // make sure refresh cache 
    Thread.sleep(2000);

    String val2 = cache.get("kirito");
    assertNotEquals(oldValue, val2);

}

In fact, in any case, the cached value may be inconsistent with the data source, and the business level needs to do a good job of accessing the fault-tolerant logic to the old value.

Automatic refresh

@Test
public void whenTTL_thenRefresh() throws ExecutionException, InterruptedException {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().refreshAfterWrite(1, TimeUnit.SECONDS).build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) {
                return key + ThreadLocalRandom.current().nextInt(100);
            }
        });

    String first = cache.get("kirito");
    Thread.sleep(1000);
    String second = cache.get("kirito");

    assertNotEquals(first, second);
}

As with the refresh mechanism in the previous section, refreshAfterWrite also does not block the get thread and still has the possibility of accessing the old value.

Cache hit statistics

The Guava Cache does not keep hit statistics by default, so you need to explicitly configure recordStats when building the CacheBuilder.

@Test
public void whenRecordStats_thenPrint() throws ExecutionException {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().maximumSize(100).recordStats().build(new CacheLoader<String, String>() {
            @Override
            public String load(String key) {
                return fetchValueFromServer(key);
            }
        });

    cache.get("one");
    cache.get("two");
    cache.get("three");
    cache.get("four");

    cache.get("one");
    cache.get("four");

    CacheStats stats = cache.stats();
    System.out.println(stats);
}
---
CacheStats{hitCount=2, missCount=4, loadSuccessCount=4, loadExceptionCount=0, totalLoadTime=1184001, evictionCount=0}

Notification mechanism for cache removal

In some business scenarios where we want to do some monitoring of cache invalidation or do some callback processing for invalidated caches, we can use the RemovalNotification mechanism.

@Test
public void whenRemoval_thenNotify() throws ExecutionException {
    LoadingCache<String, String> cache =
        CacheBuilder.newBuilder().maximumSize(3)
            .removalListener(
                cacheItem -> System.out.println(cacheItem + " is removed, cause by " + cacheItem.getCause()))
            .build(new CacheLoader<String, String>() {
                @Override
                public String load(String key) {
                    return fetchValueFromServer(key);
                }
            });

    cache.get("one");
    cache.get("two");
    cache.get("three");
    cache.get("four");
}
---
one=ONE is removed, cause by SIZE

The removalListener can add a callback handler to the LoadingCache, and the RemovalNotification instance contains the cached key-value pairs and the reason for their removal.

Weak Keys & Soft Values

I’m sure you’ve all studied the concepts of weak and soft references in Java Basics, so here’s a refresher.

Soft References: If an object has only soft references, the garbage collector will not recycle it when memory space is sufficient; if memory space is insufficient, it will recover these objects. As long as the garbage collector does not recycle it, the object can be used by the program.
weak references: Only objects with weak references have a shorter lifecycle. As the garbage collector thread scans the memory area under its jurisdiction, it will reclaim the memory of an object with only weak references once it is found, regardless of whether there is enough memory space available.

In Guava Cache, CacheBuilder provides three methods, weakKeys, weakValues, and softValues, to associate cached key-value pairs with the JVM garbage collection mechanism.

This operation may have its own scenarios, such as maximizing the use of JVM memory for caching, but relies on GC cleanup, which is conceivably low performance. Anyway, I don’t rely on the JVM mechanism to clean up the cache, so I don’t dare to use this feature, and stability is still first on the line.

If you need to set a cleanup strategy, you can refer to the cache expiration summary in the introduction of the fixed number and fixed time two programs, combined to ensure that the use of cache to obtain high performance at the same time, not to explode the memory.

Table of Contents