Skip to content

Add nvidia.com/gpu toleration to Ray Serve GPU workloads#18

Merged
tylerpotts merged 2 commits into
mainfrom
feat/gpu-toleration
Jun 18, 2026
Merged

Add nvidia.com/gpu toleration to Ray Serve GPU workloads#18
tylerpotts merged 2 commits into
mainfrom
feat/gpu-toleration

Conversation

@tylerpotts

Copy link
Copy Markdown
Contributor

Summary

nebari-infrastructure-core auto-taints AWS GPU node groups with nvidia.com/gpu=true:NoSchedule (nebari-dev/nebari-infrastructure-core#370), and EKS has no admission controller to inject a matching toleration. Without one, GPU-requesting Ray Serve head/worker pods stop scheduling onto GPU nodes once the taint lands.

This injects the toleration onto a Ray group's pod template when its resources request nvidia.com/gpu:

tolerations:
- key: "nvidia.com/gpu"
  operator: "Exists"
  effect: "NoSchedule"

operator: Exists matches any taint value, so it works with NIC's value: "true" and any other.

Changes

  • chart/templates/_helpers.tpl — new nebari-rayserve.tolerations helper. Given a group config (.Values.head/.Values.worker), it appends the nvidia.com/gpu toleration when resources.limits/resources.requests contains nvidia.com/gpu, plus any explicit tolerations from the group config. Renders nothing otherwise.
  • chart/templates/rayservice.yaml — wired the helper into the head and worker pod templates (only emits a tolerations: block when non-empty).
  • chart/values.yaml — added documented tolerations: [] to head and worker.

Acceptance criteria

  • A worker group that requests a GPU gets the toleration (verified via helm template).
  • Ray pods that do not request a GPU render no tolerations and are unchanged.

Testing

helm lint passes. helm template checked across three cases (output parsed with PyYAML to confirm valid YAML):

Case Head Worker
Default (no GPU) none none
Worker requests GPU none nvidia.com/gpu
Head GPU + explicit worker toleration nvidia.com/gpu explicit + nvidia.com/gpu

The acceptance criterion that a GPU pod actually schedules onto a tainted node and serves can't be exercised by the GPU-less CI kind cluster. It can be validated on kind without real GPU hardware by tainting a node nvidia.com/gpu=true:NoSchedule and advertising fake nvidia.com/gpu capacity on it, or end-to-end on real GPU infra — worth confirming before the NIC taint rolls out.

Closes #14

nebari-infrastructure-core auto-taints AWS GPU node groups with
nvidia.com/gpu=true:NoSchedule, and EKS has no admission controller to
inject a matching toleration. Without one, GPU-requesting Ray Serve
head/worker pods stop scheduling onto GPU nodes once the taint lands.

Inject the nvidia.com/gpu toleration (operator: Exists, so it matches
any taint value) on a Ray group's pod template when its resources
request nvidia.com/gpu, via a new nebari-rayserve.tolerations helper.
Pods that don't request a GPU render no tolerations and are unchanged.
Explicit head.tolerations / worker.tolerations are appended.

Closes #14
@tylerpotts tylerpotts marked this pull request as ready for review June 17, 2026 20:18
@marcelovilla marcelovilla self-requested a review June 18, 2026 15:34

@marcelovilla marcelovilla left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @tylerpotts 🚀 !

Tested this on a live EKS cluster deployed via nebari-dev/nebari-infrastructure-core#370.

I deployed the Helm chart from this branch and added a GPU worker. Can confirm that the worker (requesting GPU resources) got scheduled in a GPU node while the head (which does not request GPU resources) did not get scheduled in the GPU node.

I'm left a small comment but I'm approving. Up to you if you want to address it.

Comment thread chart/templates/_helpers.tpl
Only inject the nvidia.com/gpu toleration when the group config does not
already define a toleration for that key, so a user-provided toleration
acts as an intentional override rather than producing a duplicate entry.
@tylerpotts tylerpotts merged commit b6b54fc into main Jun 18, 2026
2 of 3 checks passed
@oren-openteams

Copy link
Copy Markdown
Collaborator

Late drive-by feedback after verifying #14 against an EKS 1.34 cluster running v0.3.1 of this chart — implementation passes all three rendering scenarios (no GPU / GPU implicit / GPU + explicit user toleration). Full verification posted on the issue: #14 comment.

Two observations worth recording for context:

The premise about EKS turns out to be incorrect

The PR body says "EKS has no admission controller to inject a matching toleration". That isn't what I observed. Reproduction on stock EKS 1.34:

# kubectl apply -- raw pod, no kubespawner, no manual toleration
apiVersion: v1
kind: Pod
metadata: {name: gpu-no-toleration, namespace: default}
spec:
  containers:
  - name: c
    image: nvcr.io/nvidia/cuda:12.4.1-base-ubuntu22.04
    command: ["nvidia-smi"]
    resources: {limits: {nvidia.com/gpu: 1}}

Read back from the apiserver, the persisted pod has {key: nvidia.com/gpu, operator: Exists, effect: NoSchedule} auto-injected. Ruled out all mutating webhooks on the cluster — by elimination it's the ExtendedResourceToleration admission controller, which AWS appears to enable silently (it isn't in their published admission-plugin list, which is probably what led the source issue to the opposite conclusion).

So on managed EKS / GKE / AKS, the chart-side auto-injection here is technically redundant — ERT would do the same job at admission time. On vanilla / kubeadm / on-prem clusters where ERT isn't enabled, this PR is the only thing keeping GPU Ray workloads scheduling.

Why this PR is still correct on EKS

The implementation cleanly composes with ERT for two reasons:

  1. Conditional — only fires when resources.limits or resources.requests contains nvidia.com/gpu. Non-GPU Ray workloads stay clean.
  2. Append + dedupe-on-key$hasGpuToleration check skips auto-injection when the user already provided one for the nvidia.com/gpu key, and explicit user tolerations are appended rather than replaced.

The combination means on EKS the chart-injected toleration matches what ERT would inject, ERT's idempotency skips the duplicate during admission, and net pod spec is identical with or without this PR. On non-managed clusters it's load-bearing. No false-positive injection, no override of user intent — exactly the shape one wants.

Pattern worth replicating

The analogous data-science-pack docs (PR nebari-dev/data-science-pack#139) recommend a kubespawner_override.tolerations: [...] recipe that I demonstrated silently wipes any global c.KubeSpawner.tolerations configured by the chart (e.g. z2jh's hub.jupyter.org/dedicated=user) for the affected profile, because kubespawner_override overrides rather than merges. The implementation pattern in this PR — conditional auto-injection in chart-rendered pod templates with explicit-overrides-win — is exactly what dsp issue #117 needs. Worth borrowing.

Nice work on the helper logic — leaving the issue verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add nvidia.com/gpu toleration to Ray Serve GPU workloads

4 participants