How to Adjust Pod Resources for Suspended Kubernetes Jobs (v1.36+)
Introduction
In Kubernetes v1.36, a new beta feature allows you to modify CPU, memory, GPU, and extended resource requests and limits on a suspended Job. This is a game-changer for batch and machine learning workloads where resource requirements often depend on real-time cluster capacity and queue priorities. Previously, you'd have to delete and recreate a Job to change its resource spec, losing metadata and history. Now, you can adjust resources while the Job is paused and then resume it — without starting from scratch.
This step-by-step guide will walk you through using this feature manually or with a queue controller like Kueue.
What You Need
- A Kubernetes cluster running version v1.36 or later (the feature gate is enabled by default)
kubectlinstalled and configured to access your cluster- A suspended Job (or create one following Step 1)
- Basic familiarity with Kubernetes Jobs and resource management
Step-by-Step Instructions
Step 1: Create or Identify a Suspended Job
If you don’t already have a suspended Job, create one that requests specific resources. The key is to set spec.suspend: true in the Job manifest. Below is an example of a machine learning training Job asking for 4 GPUs, 8 CPUs, and 32 GiB of memory:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
spec:
suspend: true
template:
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
limits:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
restartPolicy: Never
Apply this manifest with kubectl apply -f job.yaml.
Step 2: Confirm the Job Is Suspended
Run the following command to verify that the Job is in a suspended state:
kubectl get job training-job-example-abcd123 -o jsonpath='{.spec.suspend}'
It should output true. You can also list all Jobs with kubectl get jobs and look for a Status of 0/1 completed tasks.
Step 3: Modify the Resource Requests and Limits
While the Job is suspended, you can change its pod template resource fields. For example, if the cluster only has 2 GPUs available, adjust the requests and limits accordingly. Use kubectl patch, kubectl edit, or a direct update via API. Here’s how to patch the resource fields:
kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"trainer","resources":{"requests":{"cpu":"4","memory":"16Gi","example-hardware-vendor.com/gpu":"2"},"limits":{"cpu":"4","memory":"16Gi","example-hardware-vendor.com/gpu":"2"}}}]}}}}'
This updates the Job’s pod template. Because the Job is suspended, this modification is allowed (the immutability constraint is relaxed).
Step 4: Verify the Changes
Check that the resources have been updated correctly:
kubectl get job training-job-example-abcd123 -o yaml
Look under spec.template.spec.containers[0].resources — they should now show the adjusted values. No new Pods are created yet because the Job is still suspended.
Step 5: Resume the Job
Once you’re satisfied with the resource settings, unsuspend the Job by setting spec.suspend to false:
kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"suspend":false}}'
Kubernetes will now launch the Pods using the updated resource specifications. You can monitor progress with:
kubectl get pods -l job-name=training-job-example-abcd123
Step 6: Confirm Pod Resources
After the Job resumes, inspect one of the running Pods to ensure the resources are applied:
kubectl get pod -o jsonpath='{.spec.containers[0].resources}'
The output should match the new values you set in Step 3. If everything looks good, you’ve successfully adjusted resources for a suspended Job.
Tips and Best Practices
- Use a Queue Controller: For automatic resource tuning based on cluster load, integrate this feature with controllers like Kueue. They can dynamically adjust resources without requiring manual intervention.
- Pay attention to Limits vs Requests: When reducing resources, adjust both
requestsandlimitsto match. Kubernetes will enforce limits, so mismatched values could cause unexpected behavior. - Extended Resources: This feature works with any resource type, including extended resources (e.g.,
nvidia.com/gpu). Just update the corresponding field in the patch. - Version Compatibility: The feature is beta in v1.36 and enabled by default. If you’re on an earlier alpha release (v1.35), you may need to enable the
MutablePodResourcesForSuspendedJobsfeature gate manually. - Avoid Frequent Changes: While you can update resources multiple times while the Job is suspended, keep changes minimal to reduce the risk of misconfiguration.
- Backup Original Spec: Before patching, save a copy of the original Job manifest (
kubectl get job ... -o yaml > original.yaml) so you can revert if needed.
By following these steps, you can flexibly adjust resource allocations for batch and ML jobs without losing metadata, history, or requiring deletions. This feature streamlines workload scheduling in dynamic cluster environments.
Related Articles
- Stanford's TreeHacks 2026: A 36-Hour Marathon of Innovation and Social Impact
- Revolutionizing Large Language Models with TurboQuant: Advanced Compression for KV Cache and Vector Search
- 7 Key Insights into Kubernetes v1.36's Mutable Pod Resources for Suspended Jobs
- Mastering Agentic Data Science with Marimo Pair: A Step-by-Step Guide
- How to Earn Google’s New AI Professional Certificate for Free (U.S. Small Business Guide)
- From Pincers to Stingers: The Metal-Reinforced Arsenal of Scorpions
- Kubernetes v1.36 Beta: Adjusting Pod Resources on Suspended Jobs
- Markdown Mastery Now Non-Negotiable for GitHub Users, Experts Warn