Kubernetes & long running batch Jobs
This is a followup to the first part of this series, KEDA & Windows, where I wrote about using Keda to schedule jobs based on work put into an Azure storage queue.
There are some nuances in using Kubernetes to run long running jobs. If you have a long running job, you don’t want the horizontal pod autoscaler or the cluster autoscaler to terminate the job’s pod as part of a balancing or scale down process. To prevent this, there are some configuration settings that you can adjust. In this post, I’ll describe two of them. The first (Pod Distruption Budget) is the most general, but requires the most configuration. The second (CA eviction label) is simple, but only works for jobs.
Setting Pod Disruption Budget
The first method for preventing your pods from being terminated is to set a Pod Disruption Budget. By configuring the PDB minAvailable to be the same as the number of jobs you desire, you can prevent the voluntary eviction of pods by the scaling engine. This is a bit confusing, but let me provide an example:
First, recall the scaledJob manifest from the previous blog post. I’ve added a label app: winworker
to the template spec so that it can be used with the PDB. Additionally, I’ve set activeDeadlineSeconds to 2400 seconds (30 min for my job + 10 min extra ‘buffer’ time). If a job runs for longer than this amount of time, Keda will assume it has a problem and kill it off.
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: azure-queue-scaledobject-jobs-win
namespace: default
spec:
pollingInterval: 30
maxReplicaCount: 50
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTargetRef:
parallelism: 1
completions: 1
activeDeadlineSeconds: 2400 # set to expected max runtime + some buffer
backoffLimit: 6
template:
metadata:
labels:
app: winworker
spec:
nodeSelector:
kubernetes.io/os: windows
containers:
- name: consumer-job
image: $ACR.azurecr.io/queue-consumer-windows
resources:
requests:
cpu: 100m
memory: 2000Mi # intentionally set high in order to trigger cluster autoscaler
limits:
cpu: 100m
memory: 2000Mi
env:
- name: AzureWebJobsStorage
valueFrom:
secretKeyRef:
name: secrets
key: AzureWebJobsStorage
- name: QUEUE_NAME
value: keda-queue
triggers:
- type: azure-queue
metadata:
queueName: keda-queue
queueLength: '1'
connectionFromEnv: AzureWebJobsStorage
Next, I’ll specify the PDB. Notice that I set minAvailable to 50, which is the same number as maxReplicaCount in the ScaledJob:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ww-pdb
spec:
minAvailable: 50
selector:
matchLabels:
app: winworker
And, Voila! You will now be able to keep your jobs running for activeDeadlineSeconds!
Preventing the CA from evicting pods
The pod disruption budget is a great general purpose solution that will work with any workload: jobs, deployments,etc. However, if all you want to prevent is pods being evicted when a node is set to drain, (eg, prevent the cluster autoscaler from scaling down a node with a long running job) there is a simpler method you can use.
To achieve this, you can simply set an annotation on your pod as:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
Specifically:
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: azure-queue-scaledobject-jobs-win
namespace: default
spec:
pollingInterval: 30
maxReplicaCount: 50
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTargetRef:
parallelism: 1
completions: 1
activeDeadlineSeconds: 2400 # set to expected max runtime + some buffer
backoffLimit: 6
template:
metadata:
annotations:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
labels:
app: winworker
spec:
nodeSelector:
kubernetes.io/os: windows
containers:
- name: consumer-job
image: $ACR.azurecr.io/queue-consumer-windows
resources:
requests:
cpu: 100m
memory: 2000Mi # intentionally set high in order to trigger cluster autoscaler
limits:
cpu: 100m
memory: 2000Mi
env:
- name: AzureWebJobsStorage
valueFrom:
secretKeyRef:
name: secrets
key: AzureWebJobsStorage
- name: QUEUE_NAME
value: keda-queue
triggers:
- type: azure-queue
metadata:
queueName: keda-queue
queueLength: '1'
connectionFromEnv: AzureWebJobsStorage
And that’s it, no PDB or other configuration needed. (Do be aware of activeDeadlineSeconds
, as that will have the same effect in any case)