Skip to content

feat: add gpu container support to microvm driver#1143

Open
drew wants to merge 3 commits intomainfrom
dn/vcaux-brisebo/vm-container-support
Open

feat: add gpu container support to microvm driver#1143
drew wants to merge 3 commits intomainfrom
dn/vcaux-brisebo/vm-container-support

Conversation

@drew
Copy link
Copy Markdown
Collaborator

@drew drew commented May 4, 2026

Summary

Related Issue

Changes

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@drew drew requested a review from a team as a code owner May 4, 2026 04:36
@drew drew force-pushed the vm-container-support branch from 1de95c9 to 13bae67 Compare May 4, 2026 05:59
Base automatically changed from vm-container-support to main May 4, 2026 06:23
@drew drew force-pushed the dn/vcaux-brisebo/vm-container-support branch from ea7a545 to 5943739 Compare May 4, 2026 06:27
# TODO(gpu): Pin SHA-256 checksum for reproducible builds. Compute with:
# curl -fsSL <url> | sha256sum
RUN curl -fsSL \
"https://us.download.nvidia.com/XFree86/Linux-x86_64/${NVIDIA_DRIVER_VERSION}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}.run" \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems brittle since the user mode driver version needs to match the kernel mode driver (in the VM) exactly. Why is the user mode driver not installed in the VM itself?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The userspace-in-image split is intentional — the VM guest is a minimal libkrunfw kernel with no package manager and no NVIDIA Container Runtime, so there's nothing to mount or install drivers at boot. The .run --no-kernel-module installer is the only way to get libcuda.so / libnvidia-ml.so into the rootfs without pulling in apt dependencies.

That said, the version-coupling concern is valid. Addressed in this update:

Removed the ARG default — bare docker build now fails immediately with a clear error instead of silently using a stale version.
Added build.sh — sources versions.env (single source of truth) and passes --build-arg. The CLI's --from path also auto-detects versions.env in the Dockerfile context.
Added a version stamp (/etc/openshell-gpu-driver-version) — the VM driver reads this at sandbox creation and warns if it doesn't match the kernel module firmware version.

/// image like `ubuntu:latest` instead of the GPU sandbox Dockerfile.
fn warn_missing_gpu_userspace(rootfs: &Path) {
let nvidia_smi_candidates = [
"usr/bin/nvidia-smi",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some systems nvidia-smi is installed at /usr/sbin/nvidia-smi.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added usr/sbin/nvidia-smi to the candidates.

Comment thread tasks/scripts/vm/build-nvidia-modules.sh Outdated
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.


# Kernel modules are built for a specific guest kernel version.
# If the running kernel doesn't match, depmod/modprobe will silently fail.
local expected_kver="6.12.76"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also move this to a single location? (Codex suggests pins.env).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(alternatively, does it make sense to template this script and render it in the VM just before init?)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with pins.env over templating. The init script is include_str!'d at compile time with no existing template infrastructure, and the kernel version is a build-time invariant tied to LIBKRUNFW_REF

/// ensures the guest always runs the init script and supervisor that match
/// the running driver binary.
pub fn refresh_runtime_artifacts(rootfs: &Path) -> Result<(), String> {
let init_path = rootfs.join("srv/openshell-vm-sandbox-init.sh");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why don't we use the global constant here (and elsewhere)?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this is just an oversight. SANDBOX_GUEST_INIT_PATH is defined at the top of the file but the two call sites (refresh_runtime_artifacts and prepare_sandbox_rootfs) duplicate the path as an inline string because the constant has a leading / and Path::join would discard the rootfs prefix. The fix is straightforward: use SANDBOX_GUEST_INIT_PATH.trim_start_matches('/') at the join sites, same pattern that require_rootfs_path already uses. Applied the same treatment to SANDBOX_SUPERVISOR_PATH in validate_sandbox_rootfs for consistency.

Comment thread sandboxes/nvidia-gpu/Dockerfile Outdated
# --build-arg NVIDIA_DRIVER_VERSION=<ver> manually.

ARG CUDA_VERSION=12.8.1
ARG UBUNTU_VERSION=22.04
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The onle reason these are set is to define the base image version. Is there value in making the base image more flexible? Is vulnerabilities in out-of-date images a concern in the VM use case? (if so, does it not make sense to just start with ubuntu:latest instead of an nvidia/cuda base image -- especially since we don't actually need access to the CUDA repos defined in the nvidia/cuda image?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to ubuntu:latest.

Comment thread sandboxes/nvidia-gpu/Dockerfile Outdated
Comment on lines +76 to +77
&& echo "/usr/local/cuda/lib64" > /etc/ld.so.conf.d/cuda.conf \
&& echo "/usr/lib/x86_64-linux-gnu" >> /etc/ld.so.conf.d/cuda.conf \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for the CUDA base images, these should not be required. Also, do we only support x86_64 VMs here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundant cuda.conf / ld.so.conf.d block was already removed alongside the switch from nvidia/cuda to ubuntu:latest. The .run installer places libs in /usr/lib/x86_64-linux-gnu/, which is in the default linker search path, so a bare ldconfig is sufficient — no explicit ld.so.conf.d entries needed.

# version. The open modules support newer kernels better than the
# proprietary DKMS source shipped in /usr/src/.
# Source the shared version pin. Individual overrides still work via
# NVIDIA_OPEN_VERSION environment variable.
Copy link
Copy Markdown
Member

@elezar elezar May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if NVIDIA_OPEN_VERSION is specified, then it is up to the user to ensure that the same driver version is used for the kernel modules and the usermode components?

Does it make sense to just use NVIDIA_DRIVER_VERSION everywhere? Does this remove a possible source of error -- at least at this stage?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified on NVIDIA_DRIVER_VERSION everywhere. The single pin in versions.env now drives both the Dockerfile's .run installer URL and build-nvidia-modules.sh's open-gpu-kernel-modules tag/tarball/firmware path.

mkdir -p "${OUTPUT_DIR}"

NPROC="$(nproc 2>/dev/null || echo 4)"
IGNORE_CC_MISMATCH=1 make -C "${NVIDIA_SRC}" \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we cache these at all, or do we always rebuild this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tarball download and extraction are presence-cached (skipped if the archive/source dir already exists), but the make modules step runs unconditionally every time. make's own incremental compilation avoids recompiling unchanged .o files, so re-runs are faster than a clean build, but it still does a full dependency scan and link pass.

Since this is a one-time setup step (mise run vm:nvidia-modules) rather than something that runs on every sandbox create, the current behavior is probably acceptable for now. If we want to optimize it later, we could add a stamp file keyed on sha256(NVIDIA_VERSION + KERNEL_VERSION + kconfig) that skips the entire build when the inputs haven't changed — happy to add that as a follow-up if you think it's worth it at this stage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see the download and extraction being cached after I asked this. I think what we have is fine for the time being. Let's look at adding a stamp file if we find this problematic. As you say, this is not for every sandbox run.

Signed-off-by: Vincent Caux-Brisebois <vcauxbrisebo@nvidia.com>
@vince-brisebois vince-brisebois force-pushed the dn/vcaux-brisebo/vm-container-support branch from bffe5d7 to 1eb4cb3 Compare May 5, 2026 18:18

/// Load key=value pairs from a `versions.env` file in the given directory.
/// Returns an empty map if the file doesn't exist or can't be read.
fn load_versions_env(context: &Path) -> HashMap<String, String> {
Copy link
Copy Markdown
Member

@elezar elezar May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This can be used to process any env file and not just versions.env.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — renamed to load_env_file and parameterized the filename.

if key.is_empty() {
return None;
}
Some((key.to_string(), value.trim().to_string()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a question, how does this handle ENVVAR="value"? (I'm not saying that it should)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't strip quotes — they'd be included as literal characters in the value. Our .env files use bare assignments so this works today. Added a doc comment noting the limitation rather than growing a full shell parser; if we need that later we should pull in dotenvy.

format!("/lib/modules/{GUEST_KERNEL_VERSION}/extra/nvidia"),
] {
let p = PathBuf::from(&candidate);
if dir_has_ko_files(&p) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is this the only case where we check for .ko files?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env-var and state-dir paths (steps 1–2) are explicit operator config, so we trust .is_dir() and let inject_gpu_modules fail with a clear error if there are no .ko files. The host paths (steps 3–4) are speculative probes, so has_nvidia_ko avoids claiming a directory that happens to exist but has no NVIDIA modules. The build-tree paths also use has_nvidia_ko.

Comment on lines +418 to +419
/// This is a development convenience — production deployments should use
/// `OPENSHELL_GPU_MODULES_DIR` or pre-provision `<state_dir>/gpu-modules/`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to ensure that this code path is not triggered in production? I would assume that there are security concerns with allowing this there?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no hard guard, just the heuristic that target/libkrun-build/… won't exist in production installs. Gated discover_build_tree_modules() behind #[cfg(debug_assertions)] so release builds only resolve from the env var or the state-dir provisioned path.

None
}

fn dir_has_ko_files(dir: &Path) -> bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we look for specific NVIDIA-specific ko modules?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — tightened the check to require nvidia.ko specifically. The previous generic .ko check could have matched unrelated module directories.

/// host firmware causes version mismatches when the host driver differs
/// from the image's driver version.
fn inject_gpu_firmware(rootfs: &Path, modules_dir: &Path) {
let fw_dst = rootfs.join("lib/firmware/nvidia");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What ensures that this path does not include symlinks that resolve on the host?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing was preventing it — fs::copy follows symlinks. Added symlink detection in copy_dir_contents to skip (with a warning) any entry that is a symlink, so host files can't leak into the guest rootfs through the firmware or modules directories.


/// Extract the default value from a `GUEST_KERNEL_VERSION="${GUEST_KERNEL_VERSION:-<default>}"`
/// line in pins.env.
fn parse_guest_kernel_version(contents: &str) -> Option<String> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical for this PR, but we should reuse this logic if we add other envvars with similar logic.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — extracted into a generic parse_env_default(contents, var_name) helper so the same parser is reusable for future pins.

Comment on lines +50 to +53
> **Note:** GPU passthrough requires an **x86_64 host and guest**. The QEMU
> backend uses `qemu-system-x86_64`, and the NVIDIA driver installer /
> kernel module build scripts target x86_64 exclusively. ARM/aarch64 GPU
> passthrough is not yet supported.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is this detected somewhere?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not previously — you'd get a confusing QEMU failure. Added an early arch check in the GPU startup path that errors with "GPU passthrough requires x86_64" before attempting to launch QEMU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants