Pod prioritization, preemption
Pod prioritization and preemption, introduced in kubernetes v1.8, entered beta status in v1.11, and entered GA phase in v1.14, is already a mature feature.
As the name suggests, the Pod priority, preemption feature, by subdividing applications into different priorities, prioritizes resources to high-priority applications, thus improving resource availability while guaranteeing the quality of service for high-priority applications.
Let’s use the Pod priority and preemption function briefly.
Ibu’s cluster version is v1.14, so feature
PodPriority is enabled by default. The use of preemption mode is divided into two steps.
- define PriorityClass, the value of different PriorityClass is different, the larger the value the higher the priority.
- Create a Pod and set the Pod’s priorityClassName field to the expected PriorityClass.
As follows, Ibu first creates two PriorityClasses:
low-priority, whose values are 1000000 and 10 respectively.
Note that Ibu sets
low-priority to true, so
low-priority is the default PriorityClass of the cluster, and any Pod that does not have the priorityClassName field configured will have its priority set to
low-priority of 10. A cluster can only have one default PriorityClass. if the default PriorityClass is not set, the priority of the Pod without the PriorityClassName field will be 0.
Check the current PriorityClass of the system after creation.
As you can see, in addition to the two PriorityClasses created above, the default system also has built-in
system-node-critical for high-priority system tasks.
Set the PriorityClassName of Pod
For verification purposes, Ibu uses extended resource here. Ibu sets the capacity of the extended resource
example.com/foo to 1 for node x1.
Looking at the allocatable and capacity of x1, you can see that there is 1
example.com/foo resource on x1.
We first create the Deployment nginx, which will request one
example.com/foo resource, but we don’t set the PriorityClassName, so the Pod’s priority will be the default
low-priority specified by 10.
Then create the Deployment debian, which does not request the
At this point both Pods can be started normally.
We change the Deployment debian’s
example.com/foo request volume to 1 and set the priorityClassName to
At this point, since there is only 1
example.com/foo resource on x1 in the cluster and debian has a higher priority, the scheduler will start to seize it. The following is the observed Pod process.
Gentleman: Non-preempting PriorityClasses
kubernetes v1.15 added a field
PreemptionPolicy for PriorityClasses, when set to
Never, the Pod will not preempt Pods with lower priority than it, just the scheduling will be prioritized (refer to the value of PriorityClass).
So I call this PriorityClass “gentleman”, because he just silently queue up according to his ability (Priority), and will not steal other people’s resources. The official website gives a suitable example is data science workload.
Compare to Cluster Autoscaler
When kubernetes on the cloud is running low on cluster resources, it can automatically scale the nodes through Cluster Autoscaler, i.e., request more nodes from the cloud vendor and add them to the cluster, thus providing more resources.
However, the shortcomings of this approach are.
- The under-cloud scenario is not easy to implement
- It costs more money to add nodes
- Not immediate, takes time
If users can more clearly divide the priority of applications, they can better improve resource utilization and quality of service by seizing resources from lower priority Pods when resources are insufficient.