Problem Background

Suppose this business scenario: after posting an article, if the article contains a video link, the user needs to download the video and transcode it. There is some subsequent business logic after the transcoding is done.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
@RestController
class ArticleController {
  @PostMapping("/article")
  fun createArticle(article: Article) {
    if (article.videos.isNotEmpty()) {
      val urls = videoTranscodeService.transcode(articles.videos)
      article.setVideoUrls(urls)
    }
    // Continue other business logic after video transcoding is complete
    processArticle(article)
  }
}

If this code is written in the article publishing service, and videoTranscodeService is another remote service that provides an asynchronous interface, then we can’t simply write a synchronous method, wait for the method to return and continue executing the business logic later. After using asynchronous logic, the code might look like this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
@RestController
class ArticleController {
  @PostMapping("/article")
  fun createArticle(article: Article) {
    if (article.videos.isNotEmpty()) {
      videoTranscodeService.transcode(articles.videos)
      return
    }
    processArticle(article)
  }

  @PostMapping("/article/video-transcode-callback")
  fun videoTranscodeServiceCallback(articleId: String, urls: List<String>) {
    val article = articleRepository.findById(articleId)
    article.setVideoUrls(urls)
    processArticle(article)
  }
}

After the asynchronous code transformation.

  • The originally clear and easy-to-read sequential code (sequential code) of the article publishing service becomes very fragmented, and it is hard to see the whole business process in its entirety.
  • If considering the inter-service system robustness, the video transcoding service may use a message queue to receive the video transcoding task, and the article publishing service sends the transcoding task to the message queue after dropping the article data, which increases the system complexity. In addition, how to ensure the atomicity of the two operations of updating database and sending message queue for article service is also an issue.

Is it possible to provide an abstraction layer that encapsulates the asynchronous callbacks between services to simulate synchronous sequential execution while ensuring system robustness?

Abstract synchronous method execution

Analogy to async / await: preserving intermediate state

The “asynchronous to synchronous” problem is reminiscent of the async / await feature included in languages like C# and JavaScript. In simple terms, the compiler compiles functions marked with the suspend (async-like) keyword into a state machine where the following is stored.

  • Execution environment (local variables)
  • Program counter (program counter, which step of program execution)

These elements are stored in order for the callback to restore the original function execution and continue execution.

It is very difficult to apply this idea directly in existing programming languages, because this information is usually not available from runtime. So most workflow engines are based on some form of DSL, such as writing a JSON or YAML. BPMN is a set of graphical workflow DSLs. Many workflow engines (e.g. Activiti, Camunda, Flowable, etc.) support BPMN.

Limitations of DSL-based workflow engines.

  • Additional learning costs
  • Domain-specific DSLs may not be suitable for all business scenarios
  • Less expressiveness than general-purpose programming languages

Event sourcing idea: replaying execution records and results

Temporal uses the Event sourcing idea. We can write a workflow method in a programming language supported by the Temporal SDK. The method must be a pure function. The result of a pure function is deterministic regardless of when it is called and how many times it is called.

Event sourcing: Logs changes to the application state in the form of a log, rather than the application state itself. The latest status is available in the form of replay logs.

But our business logic methods will definitely include the “side effects” of reading and writing to the database and calling external services.

Temporal provides the Activity API to do these side effects. The execution logs and return results of these side effects are saved. This way, the state of function execution can be reconstructed by replaying the logs.

Take the business scenario mentioned earlier as an example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
interface PublishArticleWorkflow {
  @WorkflowMethod
  fun run(article: Article)
}

class PublishArticleWorkflowImpl : PublishArticleWorkflow {
  private val videoTranscodeActivities = Workflow.newActivityStub(
    VideoTranscodeActivities.transcode::class.java
  )

  override fun run(article: Article) {
    if (article.videos.isNotEmpty()) {
      val urls = videoTranscodeActivities.transcode(articles.videos)
      articles.setVideoUrls(urls)
    }
    // Continue other business logic after video transcoding is complete
    processArticle(article)
  }
}

PublishArticleWorkflow#run is the Workflow method, when the method executes to the line val urls = videoTranscodeActivities.transcode(articles.videos).

  • Temporal SDK calls the Temporal service to generate a task that encodes a video
  • Assuming that the video encoding service is integrated with the Temporal SDK, it consumes the encoded video task from the Temporal service
  • After the task is executed, the SDK calls back to the Temporal service with the return value of the method execution.
  • Temporal records the history and results of the method calls in the database and replay the execution history of our business logic by re-executing the Workflow method, building the intermediate execution state of the business function and continuing down the line.

“Replay” means that the workflow methods are actually executed many times. Whenever a new event occurs that causes a state change, the workflow method is re-executed, such as when an Activity completes, an external incoming signal, etc.

With Temporal, we can write workflows in a familiar programming language, implementing complex logic with the language’s own conditional, looping, etc. Temporal also provides a Promise concurrency primitive and a Workflow.sleep method for delayed effects. The resulting code is written to perform similarly to the initial synchronous, local methods.

The abstraction provided by Temporal is not the first of its kind and is essentially the same as Azure Durable Function.

Meaning: React on the back end?

In learning about Temporal, it may feel very similar to the currently popular declarative UI, especially since both emphasize writing pure functions.

The core premise for React is that UIs are simply a projection of data into a different form of data. The same input gives the same output. A simple pure function.

https://github.com/reactjs/react-basic

In addition, it has been argued that Temporal has revolutionized back-end development even more than React did front-end development, as summarized by Vercel CEO Guillermo Rauch in a business-to-business tout.

temporal.io does to backend and infra, what React did to frontend. If you’re in the React world, you’ve forgotten about manually adding and removing DOM elements, updating attributes and their quirks, hooking up event listeners… It’s not only been a boost in developer experience, but most importantly in consistency and reliability. In the backend world, this reliability problem is absurdly amplified as monoliths break into SaaS services, functions, containers. You have to carefully manage and create queues to capture each side effect, ensure everything gets retried, state is scattered all over the place.

https://twitter.com/rauchg/status/1316808665370820609

Before declarative UI on the Web, it was easy to write “noodle code”, register a listener and wait for an event to update the DOM; similarly, under “microservices” architecture, common back-end code listens to a message queue here and receives a callback notification there, and then updates the database afterwards.

Temporal abstracts away the details of communication and scheduling between services, and puts the control logic into the framework, allowing us to focus on developing the business logic of each service itself, which not only improves the development experience and efficiency, but also enhances the robustness of the system.

Any sufficiently complicated distsys contains an adhoc bug-ridden implementation of half of Temporal.

https://twitter.com/temporalio/status/1519330803582439424

Risks

The programming model introduced by Temporal is relatively new to back-end programmers and may take some time to digest. In addition, there are some special points to note about the Workflow code.

  • Workflow functions need to be deterministic, and side effects need to go through the Activity API or a special API provided by Workflow, not directly calling random, date, multi-threaded, and other related APIs provided by the runtime.
  • Workflow functions that are already online need to be made compatible.
  • There is an upper limit to the number of state machine transitions for a Workflow.

Some history about the Temporal project

Temporal is a fork of Cadence, an open source project of Uber, and the author of Cadence left Uber to start Temporal, which received $100 million in Series B funding in February 2022, and will sell a hosted version of Temporal services in the future. Based on out-of-the-box experience, Temporal’s advantages over Cadence are.

  • API design is more logical. Specific example: Temporal can easily configure a custom Jackson ObjectMapper, which is very cumbersome for Cadence to implement.
  • Temporal supports SDKs for more languages: TypeScript and PHP, with support for other languages under development.
  • The new version of the Web UI looks better.

Learn More

To learn more about Temporal’s programming model, we recommend Cadence’s Getting Started document, which provides a more complex, realistic example of a business scenario.