Skip to content

fix(onboard): warn on arm64 NIM image compatibility (Fixes #5772)#5868

Open
deepujain wants to merge 2 commits into
NVIDIA:mainfrom
deepujain:fix/5772-arm64-nim-warning
Open

fix(onboard): warn on arm64 NIM image compatibility (Fixes #5772)#5868
deepujain wants to merge 2 commits into
NVIDIA:mainfrom
deepujain:fix/5772-arm64-nim-warning

Conversation

@deepujain

@deepujain deepujain commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the missing arm64 Local NVIDIA NIM image-compatibility warning during onboarding.

GB10 detection and the NIM-local provider menu are already working on current main, so this PR leaves that behavior alone. The new warning is advisory only: it tells Linux arm64 DGX Spark/Station users that some NIM images may not publish linux/arm64 manifests, while still allowing the existing NIM pull/start path to try the selected image.

Fixes #5772

Changes

  • src/lib/onboard/nim-image-compat-warning.ts: adds a small helper for the Linux arm64 DGX Spark/Station warning.
  • src/lib/onboard/provider-host-state.ts: prints the warning once when nim-local is available in the provider options.
  • src/lib/onboard/nim-image-compat-warning.test.ts: covers the warning eligibility and wording.
  • test/onboard-nim-image-compat-warning.test.ts: exercises the compiled onboarding flow with a fake Linux arm64 DGX Spark GPU and confirms the warning appears when Local NIM is offered.

Testing

  • npm install --ignore-scripts - completed.
  • npm run build:cli - passed.
  • npx vitest run src/lib/onboard/nim-image-compat-warning.test.ts test/onboard-nim-image-compat-warning.test.ts - passed.
  • npm run typecheck:cli - passed.
  • npm run source-shape:check - passed outside the sandbox after the sandboxed tsx IPC pipe failed with EPERM.
  • npm run test-size:check - passed outside the sandbox after the sandboxed tsx IPC pipe failed with EPERM.
  • git diff --check - passed.
  • npx @biomejs/biome format src/lib/onboard/nim-image-compat-warning.ts src/lib/onboard/nim-image-compat-warning.test.ts src/lib/onboard.ts test/onboard-nim-image-compat-warning.test.ts - passed.
  • npx @biomejs/biome lint src/lib/onboard/nim-image-compat-warning.ts src/lib/onboard/nim-image-compat-warning.test.ts src/lib/onboard.ts test/onboard-nim-image-compat-warning.test.ts - passed.

Evidence it works

The focused onboarding regression overrides process.arch and process.platform to simulate a Linux arm64 DGX Spark host, passes a GB10-style GPU object with nimCapable: true, enables NEMOCLAW_EXPERIMENTAL=1, and selects the default cloud provider so no real NIM image is pulled. The test confirms onboarding still completes with nvidia-prod while the output contains both the Local NIM arm64 warning and the linux/arm64 manifest note.

Signed-off-by: Deepak Jain deepujain@gmail.com

@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@deepujain, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes and 30 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e446af7-eb90-476d-87e4-3d691cbd979e

📥 Commits

Reviewing files that changed from the base of the PR and between 25ddc8b and 8096059.

📒 Files selected for processing (4)
  • src/lib/onboard/nim-image-compat-warning.test.ts
  • src/lib/onboard/nim-image-compat-warning.ts
  • src/lib/onboard/provider-host-state.ts
  • test/onboard-nim-image-compat-warning.test.ts
📝 Walkthrough

Walkthrough

Adds an ARM64/Linux NVIDIA NIM compatibility warning helper, calls it during onboarding NIM setup, and adds unit and integration tests covering the warning condition, formatted message, and logged onboarding output.

Changes

Arm64 NIM compatibility warning

Layer / File(s) Summary
Warning helper and unit tests
src/lib/onboard/nim-image-compat-warning.ts, src/lib/onboard/nim-image-compat-warning.test.ts
Adds the Linux arm64 DGX Spark/Station warning predicate, message formatter, and logger wrapper; the Vitest suite covers matching and non-matching inputs plus exact logged output.
Onboarding hook and integration test
src/lib/onboard.ts, test/onboard-nim-image-compat-warning.test.ts
Imports the warning helper into setupNim, calls it during NIM setup with the detected GPU and nim-local availability, and verifies the warning appears in a built onboarding run.

Sequence Diagram(s)

sequenceDiagram
  participant setupNim
  participant warnAboutArm64NimImageCompatibility
  participant "console.log" as consoleLog
  setupNim->>warnAboutArm64NimImageCompatibility: pass gpu and nim-local availability
  warnAboutArm64NimImageCompatibility->>consoleLog: emit warning lines when the condition matches
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A bunny hopped by the onboarding trail,
On arm64 winds, a warning set sail.
Spark and Station got a tiny cheer,
“Local NIM may wobble,” it said quite clear.
🐇 Thump! The logs go softly by.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR clearly adds the arm64 NIM warning, but the provided summary doesn't confirm the linked GB10 detection requirement. Confirm or add changes/tests showing GB10 detection and NIM-local onboarding behavior on DGX Spark/Station aarch64.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately highlights the onboard arm64 NIM compatibility warning fix.
Out of Scope Changes check ✅ Passed The changes stay focused on onboarding warning logic and related tests, with no obvious unrelated additions.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

Refs NVIDIA#5772

Signed-off-by: Deepak Jain <deepujain@gmail.com>
@deepujain deepujain force-pushed the fix/5772-arm64-nim-warning branch from 25ddc8b to f9e79e7 Compare June 26, 2026 15:51

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/lib/onboard.ts (1)

4009-4012: 🩺 Stability & Availability | 🔵 Trivial

Run the onboarding E2E cohort before merge.

This hook lands in the core sandbox creation flow, so it’s worth running the selective nightly onboarding jobs called out for src/lib/onboard.ts to catch cross-flow regressions that the unit/integration tests here won’t cover. As per path instructions, src/lib/onboard.ts “contains core onboarding logic” and the listed E2E jobs are the recommended coverage for changes in this file.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 4009 - 4012, This change touches the core
onboarding flow via warnAboutArm64NimImageCompatibility in onboard.ts, so
validate it by running the selective onboarding E2E cohort before merging. Use
the recommended nightly onboarding jobs for src/lib/onboard.ts to cover
cross-flow regressions that unit/integration tests may miss, and confirm the new
nim-local option handling behaves correctly in the sandbox creation path.

Source: Path instructions

src/lib/onboard/nim-image-compat-warning.test.ts (1)

13-54: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add a regression case for the gpu.spark branch.

The predicate supports both gpu.spark === true and gpu.platform === "spark" | "station", but this suite only locks down the platform path. A future refactor could break the existing GB10 boolean path without failing tests. Based on the PR objectives, onboarding should keep the current GB10 detection behavior unchanged.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/nim-image-compat-warning.test.ts` around lines 13 - 54, Add a
regression test in shouldWarnAboutArm64NimImageCompatibility coverage for the
gpu.spark boolean path, since the current suite only exercises gpu.platform for
spark/station. Extend the existing nim-image-compat-warning.test case to assert
the same warning behavior when gpu.spark is true on Linux arm64 with
nimLocalAvailable enabled, keeping the existing function behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/onboard-nim-image-compat-warning.test.ts`:
- Around line 88-100: The spawned onboarding flow in the compat warning test has
no bound and can hang the Vitest worker if it becomes interactive or deadlocks.
Update the spawnSync call in this test to include a timeout so the process fails
deterministically instead of wedging CI, keeping the change localized to the
onboarding test helper invocation.

---

Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 4009-4012: This change touches the core onboarding flow via
warnAboutArm64NimImageCompatibility in onboard.ts, so validate it by running the
selective onboarding E2E cohort before merging. Use the recommended nightly
onboarding jobs for src/lib/onboard.ts to cover cross-flow regressions that
unit/integration tests may miss, and confirm the new nim-local option handling
behaves correctly in the sandbox creation path.

In `@src/lib/onboard/nim-image-compat-warning.test.ts`:
- Around line 13-54: Add a regression test in
shouldWarnAboutArm64NimImageCompatibility coverage for the gpu.spark boolean
path, since the current suite only exercises gpu.platform for spark/station.
Extend the existing nim-image-compat-warning.test case to assert the same
warning behavior when gpu.spark is true on Linux arm64 with nimLocalAvailable
enabled, keeping the existing function behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a241359c-89d4-475f-95cd-01b72a31b42f

📥 Commits

Reviewing files that changed from the base of the PR and between c1c7c1d and 25ddc8b.

📒 Files selected for processing (4)
  • src/lib/onboard.ts
  • src/lib/onboard/nim-image-compat-warning.test.ts
  • src/lib/onboard/nim-image-compat-warning.ts
  • test/onboard-nim-image-compat-warning.test.ts

Comment thread test/onboard-nim-image-compat-warning.test.ts
Signed-off-by: Deepak Jain <deepujain@gmail.com>
@wscurran wscurran added area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow feature PR adds or expands user-visible functionality platform: dgx-spark Affects DGX Spark hardware or workflows platform: dgx-station Affects DGX Station hardware or workflows labels Jun 26, 2026
@wscurran

Copy link
Copy Markdown
Contributor

✨ Thanks for adding the arm64 NIM image-compatibility warning for DGX Spark and Station users during onboarding. This proposes a way to surface an advisory notice when Local NIM is offered on Linux arm64 so users know some images may lack linux/arm64 manifests while the pull path remains unchanged.


Related open issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow feature PR adds or expands user-visible functionality platform: dgx-spark Affects DGX Spark hardware or workflows platform: dgx-station Affects DGX Station hardware or workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DGX Spark][Onboard] NEMOCLAW_EXPERIMENTAL=1 onboard does not detect GB10 GPU or show arm64 NIM image-compat warning on aarch64

2 participants