Timed Job Design Pattern in K8s

The Timed Job mode extends the Batch Job mode by adding a time dimension and allows a unit of work to be executed triggered by an event at that time.

Existence of problems

In the world of distributed systems and microservices, real-time and event-driven application interaction using HTTP and lightweight messaging is a clear trend. However, regardless of the latest trends in software development, Job scheduling has a long history and it is still relevant. == Scheduled Jobs are typically used to automate system maintenance or administrative tasks ==. They are also used for business applications that require specific tasks to be performed on a regular basis, typical examples here being business-to-business integration via file transfer, application integration via database polling, sending news letter emails, and cleaning and archiving old files.

For system maintenance, the traditional approach to handling periodic Jobs is to use specialized scheduling software or Cron. however, for simple use cases, specialized software can be expensive and Cron jobs running on a single server are difficult to maintain and are a single point of failure. This is why, many times, developers tend to implement solutions that can handle both the scheduling aspects and the business logic that needs to be executed. For example, in the Java world, libraries like Quartz, Spring Batch, and custom implementations with the ScheduledThreadPoolExecutor class can run temporal tasks. However, similar to Cron, the main difficulty with this approach is making the scheduling capabilities resilient and highly available, which leads to high resource consumption. Also, with this approach, the time-based task scheduler is part of the application, and to make the scheduler highly available, the entire application must be highly available. Typically, this involves running multiple instances of the application while ensuring that only one instance is active and scheduling jobs-which involves leader election and other challenges of a distributed system.

Finally, a simple service that replicates several files once a day may end up requiring multiple nodes, distributed leader election mechanisms, etc. The Kubernetes CronJob implementation solves these problems by allowing scheduling of Job resources using the well-known Cron format, allowing developers to focus only on implementing the jobs to be executed and not on the time scheduling aspect.

Solution

In Batch Jobs, we saw the use cases and features of Kubernetes Jobs. All of this applies to this chapter as well, because the CronJob primitive is built on top of a Job. a CronJob instance is similar to a row in the Unix crontab (cron table) that manages the temporal aspects of a Job. It allows a Job to be executed periodically at a specified point in time. see Example 1-1 for an example definition.

1-1. CronJob
apiVersion: batch/v1beta1
kind: CronJob
metadata: 
  name: random-generator
spec: 
  # Every three minutes
  schedule:"*/3****"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - image: k8spatterns/random-generator:1.0
            name: random-generator command: [ "java", "-cp", "/", "RandomRunner", "/numbers.txt", "10000" ]
          restartPolicy: OnFailure

In addition to the Job specification, the CronJob has additional fields to define its timing aspects: the

.spec.schedule

Crontab entry to specify the Job’s schedule (e.g., 0 * * * * * means run every hour).
.spec.startingDeadlineSeconds

Deadline (in seconds) to start a Job if it misses its scheduled time. In some use cases, a task is only valid if it is executed within a certain time frame, and is useless if it is executed late. For example, if a Job does not execute in the expected time because of a lack of compute resources or other missing dependencies, it is better to skip an execution because the data it is supposed to process is out of date.
.spec.concurrencyPolicy

Specifies how concurrent execution of jobs created by the same CronJob is managed. The default behavior Allow creates a new instance of the Job, even if the previous Job has not yet completed. If this is not the desired behavior, you can use Forbidor to skip the next run, cancel the currently running job, and start a new job using Replace if the current job has not yet completed.
.spec.suspend

Suspends all subsequent executions, but does not affect those already started.
.spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit

fields that specify how many completed and unfinished Jobs should be kept for auditing purposes.

Discussion

As you can see, CronJob is a very simple primitive that adds clustered, Cron-like behavior to the existing Job definition. But when combined with other primitives (e.g. Pods, container resource isolation) and other Kubernetes features (e.g. , Automated Placement, Health Probe), it tends to become a very powerful Job scheduling system. This allows developers to focus only on the problem domain and implement a containerized application that is only responsible for the business logic to be executed. Scheduling is done outside of the application, as part of the platform, with all the added benefits of high availability, resiliency, capacity, and policy-driven Pod scheduling. Of course, similar to the Job implementation, when implementing a CronJob container, your application must consider all the corners and failure scenarios of repeated runs, no runs, parallel runs or de-runs.

CronJob is a very specialized primitive that is only applicable when the unit of work has a time dimension. Even though CronJob is not a generic primitive, it is a good example of how the capabilities of Kubernetes are built on top of each other and also support non-cloud native use cases.

Table of Contents

Existence of problems

Solution

Discussion