Kubernetes 1.36 Introduces Adjustable Resource Allocation for Suspended Jobs
Kubernetes v1.36 promotes mutable pod resources for suspended Jobs to beta, allowing queue controllers to adjust CPU, memory, GPU, and extended resources before Job resumption.
Overview of the New Feature
Kubernetes v1.36 has promoted the ability to modify container resource requests and limits in the pod template of a suspended Job to beta. Initially introduced as an alpha feature in v1.35, this enhancement empowers queue controllers and cluster administrators to dynamically adjust CPU, memory, GPU, and extended resource specifications on a Job while it remains suspended—before it starts or resumes running. This marks a significant step forward in resource management for batch and machine learning workloads.
Why Adjustable Resources Matter for Suspended Jobs
Batch and machine learning workloads often have resource requirements that are not precisely known at Job creation time. The optimal allocation depends on current cluster capacity, queue priorities, and the availability of specialized hardware like GPUs. Before this feature, resource requirements in a Job’s pod template were immutable once set. If a queue controller like Kueue determined that a suspended Job should run with different resources, the only option was to delete and recreate the entire Job—losing any associated metadata, status, or history. This feature also provides a way to let a specific Job instance for a CronJob progress slowly with reduced resources, rather than outright failing to run if the cluster is heavily loaded.
Real-World Example: Machine Learning Training Job
Consider a machine learning training Job initially requesting 4 GPUs:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
limits:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
restartPolicy: Never
A queue controller managing cluster resources might determine that only 2 GPUs are available. With this feature, the controller can update the Job’s resource requests before resuming it:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
limits:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
restartPolicy: Never
Once the resources are updated, the controller resumes the Job by setting spec.suspend to false, and the new Pods are created with the adjusted resource specifications.
How It Works
The Kubernetes API server relaxes the immutability constraint on pod template resource fields specifically for suspended Jobs. No new API types have been introduced; the existing Job and pod template structures accommodate the change through a refined validation rule. When a Job is suspended (spec.suspend: true), the API server allows updates to spec.template.spec.containers[*].resources.requests and .limits. This modification is only permitted while the Job is not actively running Pods, ensuring consistency with running workloads.
Benefits for Cluster Administrators
- Reduced Waste: Instead of deleting and recreating Jobs, resources can be adjusted in place, preserving Job metadata and execution history.
- Better Queue Management: Queue controllers like Kueue can now dynamically resize Jobs based on real-time cluster conditions, improving overall utilization.
- Graceful Degradation: Jobs can be throttled down with fewer resources rather than failing completely, ensuring some progress during high load.
Limitations and Considerations
- Only suspended Jobs can have their pod template resources modified. Once the Job resumes, the resource fields become immutable again.
- Changes apply only to future Pods created after the update; already-running Pods remain unaffected.
- The feature is currently in beta, so it is enabled by default in Kubernetes v1.36. However, administrators may still need to verify compatibility with their queue controllers.
Conclusion
The mutable pod resources feature for suspended Jobs in Kubernetes v1.36 offers a powerful tool for managing batch and ML workloads more efficiently. By allowing adjustments to CPU, memory, GPU, and extended resources without destroying and recreating Jobs, it reduces operational overhead and improves cluster responsiveness. This enhancement is a welcome addition for administrators and automated controllers alike, paving the way for smarter resource allocation in Kubernetes environments.