suspend is a callback

Understanding suspend doesn’t really require getting hung up on what the magic “hang” means or how threads are switched. In fact, behind suspend is a very familiar callback.

Suppose postItem consists of three asynchronous subtasks with dependencies: requestToken, createPost and processPost, all of which are callback-based APIs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// 三个基于回调的 API
fun requestToken(block: (String) -> Unit)
fun createPost(
  token: String,
  item: Item,
  block: (Post) -> Unit)
)
fun processPost(post: Post)

fun postItem(item: Item) {
  requestToken { token ->
    createPost(token, item) { post ->
      processPost(post)
    }
  }
}

As you can see callback-based APIs can easily cause a lot of indentation. APIs such as Promise (Future) and RxJava, which is popular in the Android community, eliminate the problem of nesting to some extent by chaining calls. For example, if the above example is implemented in RxJava.

1
2
3
4
5
6
7
fun requestToken(): Observable<String>
fun createPost(token: String, item: Item): Observable<Post>
fun processPost(post: Post)

fun postItem(item: Item) = requestToken()
  .flatMap { createPost(it, item) }
  .flatMap { processPost(it) }

However, RxJava requires users to master a lot of operators, and writing complex logic can be cumbersome, making you feel “trapped” in the call chain.

Kotlin’s suspend keyword can help us to eliminate callbacks and write asynchronously in a synchronous way.

1
2
3
4
5
6
7
8
9
suspend fun requestToken(): String
suspend fun createPost(token: String, item: Item): Post
suspend fun processPost(post)

suspend fun postItem(item: Item) {
  val token = 🏹 requestToken()
  val post = 🏹 createPost(token, item)
  🏹 processPost(post)
}

Principle of suspend

Since the createPost methods are actually time-consuming IO asynchronous operations that need to wait until the return value is available before executing the logic that follows, but we don’t want to block the current thread (usually the main thread), we must eventually implement some kind of message passing mechanism that allows the background thread to pass the result to the main thread after doing the time-consuming operation.

Assuming we have the three callback-based APIs mentioned above, implementing suspend would wrap the logic behind each pendant starting point 🏹 in a lambda at compile time, and then call the callback API, resulting in nested-like code. Kotlin and many other languages use a generative state machine for better performance.

Specifically, the compiler sees the suspend keyword and removes the suspend and adds an extra Continuation argument to the function. This Continuation represents a callback.

1
2
3
4
5
public interface Continuation<in T> {
  public val context: CoroutineContext

  // 用来回调的方法
  public fun resumeWith(result: Result<T>)}

The Kotlin compiler generates a Continuation implementation class for each suspend block, which is a state machine in which the state corresponding to each pending start point holds the context (i.e., dependent local variables) needed to continue execution next, similar to the following pseudo-code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
suspend fun postItem(item: Item) {
  val token = 🏹 requestToken()
  val post = 🏹 createPost(token, item)
  🏹 processPost(post)
}

// 编译器变换后的伪代码
// 1.脱掉了 suspend 关键字
// 2.增加了一个 Continuation 对象
fun postItem(item: Item, cont: Continuation) {

  // 判断传入的是否是 postItem 的 `ContiuationImpl`
  // * false: 初始化一个对应本次调用 postItem 的状态机
  // * true: 对应 postItem 内其他 suspend 函数回调回来情况
  // 其中 ThisSM 指的 object: ContinuationImpl 这个匿名类
  val sm = (cont as? ThisSM) ?: object: ContinuationImpl {

    // 实际源码中 override 的是
    // kotlin.coroutine.jvm.internal.BaseContinuationImpl
    // 的 invokeSuspend 方法
    override fun resume(..) {
      // 通过 ContinuationImpl.resume
      // 重新回调回这个方法
      postItem(null, this)    }
  }

  switch (sm.label) {
    case 0:
      // 捕获后续步骤需要的局部变量
      sm.item = item
      // 设置下一步的 label
      sm.label = 1

      // 当 requestToken 里的耗时操作完成后会更新状态机
      // 并通过 sm.resume 再次调用这个 postItem 函数
      // 「我们在前面提供了 sm.resume 的实现,即再次调用 postItem」
      requestToken(sm)
    case 1:
      val item = sm.item
      // 前一个异步操作的结果
      val token = sm.result as Token
      sm.label = 2
      createPost(token, item, sm)
    case 2:
      procesPost(post)
    // ...
  }
}

The compiler compiles suspend into a method with a continuation argument called a CPS (Continuation-Passing-Style) transformation.

Using the suspend function without caring about thread switching

suspend provides a Convention that calls to this function will not block the currently calling thread. This is very useful for UI programming, because the main thread of the UI needs to constantly respond to various requests for graphics and user actions, so if there are time-consuming operations on the main thread, other requests will not be responded to in time, causing UI lag.

The Android community’s popular network request library Retrofit and the official database ORM Room already support Coroutine by providing the suspend API, and Android officials also use Kotlin extended properties to provide components with lifecycle such as Activity with the suspend API. CoroutineScope, where the context specifies the use of Dispatchers.Main, i.e. Coroutines started by lifecycleScope will be dispatched to the main thread. So we can call the suspend function and update the UI directly after we get the result without any thread switching action. Such a suspend function is called “main safe”.

1
2
3
4
lifecycleScope.launch {
  val posts = 🏹 retrofit.get<PostService>().fetchPosts();
  // 由于在主线程,可以拿着 posts 更新 UI
}

This is much better than the callback and RxJava APIs. These asynchronous APIs ultimately rely on callbacks, but the callback has to come back to the caller to figure out which thread it is in, depending on how the function is implemented. With the suspend convention of not blocking the current thread, the caller doesn’t really need to care which thread the function is executed in internally.

1
2
lifecycleScope.launch(Dispatchers.Main) {
  🏹 foo()}

For example, in the block above, we specify that this Coroutine block is scheduled to execute in the main thread, and it calls a suspend foo method from somewhere. Inside this method may be a time-consuming CPU calculation, or it may be a time-consuming IO request, but I don’t really need to care what’s going on in there and which thread it’s running in when I write this Coroutine block. Similarly, when reading this Coroutine block, it is clear that the code in front of us will be executed in the main thread, and that the code inside suspend foo is a potentially time-consuming operation, and the exact thread in which it is executed is an implementation detail of the function that is “transparent” to the logic of the current code.

But only if the suspend function is implemented correctly, so that it does not block the current thread. Simply adding the suspend keyword to a function does not magically make the function non-blocking, for example, suppose the implementation inside suspend foo looks like this.

1
2
// 😖
suspend fun foo() = BigInteger.probablePrime(4096, Random())

The internal implementation of the suspend function here is a time-consuming CPU operation, which can similarly be thought of as a period of particularly complex code. The problem is that the implementation of the foo function does not follow the semantics of suspend and is wrong. The correct approach is to modify the foo function

1
2
3
suspend fun findBigPrime(): BigInteger =
  withContext(Dispatchers.Default) {    BigInteger.probablePrime(4096, Random())
  }

With withContext we move the time-consuming operation from the current main thread to a default background thread pool. So it is said that even with Coroutine, you still end up “blocking” a thread, “all code is inherently blocking”. This understanding helps us to realize that threads are ultimately needed on Android / JVM as a vehicle to execute Coroutine, but ignores the distinction between blocking and non-blocking IO. CPU execution threads, and the above BigInteger.probablePrime is a time-consuming CPU calculation that can only wait for the CPU to compute the result, but IO does not necessarily have to block the CPU.

There is a practical difference between blocking and non-blocking IO. For example, while Retrofit supports the suspend function (which actually wraps the callback-based API enqueue), the underlying dependency on OkHttp uses a blocking method, and the final execution of the request is dispatched to the thread pool. The Ktor’s HTTP client supports non-blocking IO. Try to use these two clients to make requests concurrently and you can feel the difference.

Of course, the client does not have as many “high concurrency” scenarios as the server, and does not need to initiate a large number of requests at the same time, so using a thread pool with a blocking API is usually enough. The Spring Framework provides WebFlux in addition to the traditional Servlet-based WebMvc, which provides a non-blocking Spring WebFlux natively provides a reactive programming model (similar to RxJava) with Reactive Streams to support non-blocking APIs. suspend function directly in the controller.

With Coroutine as the official recommended asynchronous solution for Android, common asynchronous scenarios such as network requests and databases already have libraries that support Coroutine, so it is conceivable that in the future, newcomers to Android development will not really need to know the details of thread switching, and will only need to call the encapsulated suspend function directly in the main thread.

It’s not just IO that can be suspend

suspend is not exactly a thread switch per se, but asynchronous IO in Android ultimately relies on multithreading, and asynchronous IO is the main application scenario for Coroutine. Coroutine’s suspend does the same thing, but with the introduction of keywords and compiler support, we can write asynchronous logic in sequential, top-to-bottom code. Not only does this improve code readability, but it also makes it easy to write complex logic using familiar constructs such as conditionals, loops, and try catches.

Looking at Coroutine and suspend as purely thread switching tools has significant limitations. Since suspend is a callback and also provides a way to wrap the callback API, callback-based APIs can be transformed by wrapping them with suspend functions.

Android View API

Suspending over views This article describes an example of wrapping Android view-related APIs with Coroutine. examples, such as the following extension function that waits for the end of Animator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
suspend fun Animator.awaitEnd() { /* 实现见后文 */}

lifecycleScope.launch {
  ObjectAnimator.ofFloat(imageView, View.ALPHA, 0f, 1f).run {
    start(); 🏹 awaitEnd()
  }
  ObjectAnimator.ofFloat(imageView, View.TRANSLATION_Y, 0f, 100f).run {
    start(); 🏹 awaitEnd()
  }
  ObjectAnimator.ofFloat(imageView, View.TRANSLATION_X, -100f, 0f).run {
    start(); 🏹 awaitEnd()
  }
}

Using traditional callback-based APIs to express such complex sequential code results in a lot of nesting and a significant decrease in code readability. By wrapping it in a suspend function, we can write the code in top-down order in Coroutine, and at the same time facilitate the use of various conditions, loops and other logic control constructs to improve the expressiveness of the code.

The Animator.awaitEnd wraps the AnimatorListenerAdapter asynchronous callback interface, and the Kotlin Coroutine library provides the suspendCoroutine and suspendCancellableCoroutine functions (note that both of these functions are themselves suspend). We can get the Continuation instance that corresponds to the current hang in the lambda we pass in. Calling the resume series of methods on this instance in the appropriate callback will bridge the suspend function with the callback-based API

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
suspend fun Animator.awaitEnd() = 
🏹 suspendCancellableCoroutine<Unit> { cont ->
    // 如果执行这个 suspend 函数的Coroutine被取消的话,同时取消这个 Animator。
    // 注意这个 `awaitEnd` 是定义在 `Animator` 上的扩展函数,
    // 因此可以直接调用 `Animator` 上的方法。
    cont.invokeOnCancellation { cancel() }

    addListener(object : AnimatorListenerAdapter() {
      // 标记 Animator 被取消还是正常结束
      private var endedSuccessfully = true
      override fun onAnimationCancel(animation: Animator) {
        // Animator has been cancelled, so flip the success flag
        endedSuccessfully = false
      }

      override fun onAnimationEnd(animation: Animator) {
        animation.removeListener(this)

        // 如果Coroutine仍在执行中
        if (cont.isActive) {
          // 并且 Animator 未被取消
          if (endedSuccessfully) {
              cont.resume(Unit)          } else {
            // 否则取消Coroutine
            cont.cancel()
          }
        }
      }
    })
  }

Splitties is a very authentic Kotlin Android helper library that provides a suspend AlertDialog.showAndAwait method. The following example code opens a dialog box and waits for the user to confirm that they want to delete it. This is an asynchronous operation, so it “hangs” the Coroutine and returns a boolean value when the user has finished selecting it.

1
2
3
4
5
6
7
suspend fun shouldWeReallyDeleteFromTrash(): Boolean = 
  alertDialog(
    message = txt(R.string.dialog_msg_confirm_delete_from_trash)
  ).🏹 showAndAwait(    okValue = true,
    cancelValue = false,
    dismissValue = false
  )

Here AlertDialog.showAndAwait wraps the DialogInterface.OnClickListener interface using suspendCancellableCoroutine.

Note that these examples above only involve the main thread and do not involve thread switching.

Functional exception handling

Going a step further, the suspend function is not even necessarily limited to asynchronous scenarios.

The Kotlin Coroutine code we normally use is implemented in two packages, the standard Kotlin library kotlin-stdlib and the Coroutine library kotlinx.coroutines. The standard library provides Continuation and other infrastructure related to CPS transformations, and kotlinx.coroutines provides a concrete implementation of Coroutine. So we can actually use the CPS transformation infrastructure in the standard library to write other interesting things.

Λrrow (also written as Arrow) is a functional programming library for Kotlin that provides the Either data type for exception handling.

1
2
3
sealed class Either<A, B>
data class Left(val value: A): Either<A, Nothing>()
data class Right(val value: B): Either<Nothing, B>()

The value of Either can be both Left and Right. It is customary to use Right to indicate a normal return value (think of it as right, which also means correct in English) and Left to indicate an exception.

Assuming three interdependent subtasks takeFoodFromRefriderator, getKnife and lunch, note that the example here is not an asynchronous IO but an exception

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// 定义可能的异常
sealed class CookingException {
  object LettuceIsRotten : CookingException()
  object KnifeNeedsSharpening : CookingException()
  data class InsufficientAmount(val quantityInGrams: Int) : CookingException()
}

object Lettuce; object Knife; object Salad

// 三个子任务都是返回的 Either 类型
fun takeFoodFromRefrigerator(): Either<LettuceIsRotten, Lettuce> = Lettuce.right()
fun getKnife(): Either<KnifeNeedsSharpening, Knife> = Knife.right()
fun lunch(knife: Knife, food: Lettuce): Either<InsufficientAmount, Salad> = InsufficientAmount(5).left()

We can use Either.flatMap to combine the three tasks together.

1
2
3
4
5
6
7
8
fun getSalad(): Either<CookingException, Salad> =
  takeFoodFromRefrigerator()
    .flatMap { lettuce ->
      getKnife()
        .flatMap { knife ->
          val salad = lunch(knife, lettuce)
          salad
        }

Does it look similar to the nested callbacks of IO? We can also eliminate the callbacks with the CPS transformation of suspend

1
2
3
4
5
6
suspend fun getSalad() = 🏹 either<CookingException, Salad> {
  val lettuce = 🏹 takeFoodFromRefrigerator().bind()
  val knife = 🏹 getKnife().bind()
  val salad = 🏹 lunch(knife, lettuce).bind()
  salad
}

Deep recursion

Recursion applied to recursive data structures can often result in clean and elegant code. For example, the following algorithm for calculating the height of a tree.

1
2
3
4
5
class Tree(val left: Tree?, val right: Tree?)

fun depth(tree: Tree?): Int =
  if (t == null) 0 else maxOf(
    depth(tree.left),    depth(tree.right)  ) + 1

However, if the recursion is too deep beyond the limit, the runtime will throw a StackOverflowException. So we need to make use of the more spacious heap memory. Usually we can maintain a stack data structure explicitly.

There is an experimental DeepRecursiveFunction helper class in the Kotlin standard library that helps us write code that maintains the “general shape” of the recursive algorithm, but keeps the intermediate state in heap memory. The mechanism implemented there is the CPS transformation of suspend.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
val depth = DeepRecursiveFunction<Tree?, Int> { tree ->
  // 这里是一个 suspend 的 λ
  if (tree == null) 0 else maxOf(
    🏹 callRecursive(tree.left),    🏹 callRecursive(tree.right)  ) + 1
}

val deepTree = generateSequence(Tree(null, null)) { prev ->
  Tree(prev, null)
}.take(100_000).last()

// DeepRecursiveFunction 重载了 invoke 操作符
// 可以模拟函数调用语法
println(depth(deepTree)) // 100_000

DeepRecursiveFunction is connected to a suspend block, where the receiver is DeepRecursiveScope, which can be analogous to CoroutineScope. Inside this block, note that we cannot call depth directly recursively as in the original algorithm (because it still depends on the space-limited function call stack). The DeepRecursiveScope provides a suspend callRecursive method. Here, we use the state machine obtained by the CPS transformation to preserve the intermediate results of the recursive function call stack. Since the Continuation object is stored in heap memory at runtime, it bypasses the space constraints of the function call stack. (So Kotlin’s Coroutine is a so-called “stackless coroutine”.

For details, see Deep recursion with coroutines. KT-31741 has some discussions on standard library design and implementation as well as performance aspects.

As you can see from these different examples of Android UI, functional programming, and general programming, suspend can be seen as syntactic sugar for callbacks, and is not essentially related to IO or thread switching. In retrospect, the keyword suspend is often called async in other languages, while Kotlin is called suspend, perhaps suggesting that the unique design of the Kotlin Coroutine is not limited to asynchrony, but has a wider range of applications.