Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 39 additions & 39 deletions docs/fundamentals/gpu-cpu-tee-requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,56 +15,56 @@ This document is subject to periodic updates as new hardware, drivers, and valid

## Core TEE Requirements

### TEE Mode Compatibility
### 1. TEE Mode Compatibility

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve stable deep-link anchors after heading renumbering.

Adding numeric prefixes to headings changes autogenerated anchor IDs, which can break existing inbound links/bookmarks. Add explicit custom anchor tags (using the pre-existing anchor naming scheme) before each renumbered heading to keep links stable.

Proposed pattern
-### 1. TEE Mode Compatibility
+<a id="tee-mode-compatibility"></a>
+### 1. TEE Mode Compatibility

-### 2. Recommended Instance Types
+<a id="recommended-instance-types"></a>
+### 2. Recommended Instance Types

-### 3. Supported GPU + CPU Configurations
+<a id="supported-gpu-cpu-configurations"></a>
+### 3. Supported GPU + CPU Configurations

Apply the same pattern to the rest of newly numbered headings.

As per coding guidelines, "Follow the established cross-referencing pattern with custom anchor tags".

Also applies to: 22-22, 28-28, 30-30, 49-49, 67-67, 83-83, 93-93, 109-109, 118-118, 129-129, 139-139, 143-143, 152-152, 160-160, 170-170, 180-180, 192-192, 202-202, 213-213, 217-217, 229-229, 235-235

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/fundamentals/gpu-cpu-tee-requirements.md` at line 18, The renumbered
heading "1. TEE Mode Compatibility" changed autogenerated anchor IDs and will
break existing deep links; before each renumbered heading (e.g., the "1. TEE
Mode Compatibility" heading and the other listed headings) insert an explicit
HTML anchor tag using the established anchor naming scheme (the pre-existing
anchor slug used prior to adding numeric prefixes) so the anchor remains stable;
update headings in docs/fundamentals/gpu-cpu-tee-requirements.md by placing the
custom anchor tag immediately above each affected heading (use the exact
original slug string for anchors), and apply the same pattern to the other
headings referenced in the comment to preserve existing inbound links.


The system's CPU and GPU must both support TEE mode and be fully compatible with each other to operate in confidential mode.

### Recommended Instance Types
### 2. Recommended Instance Types

- Bare Metal or Colocation Hosting
- For Google Cloud — use Confidential Virtual Machine (CVM) — with Super OS image
- Support for CVMs from other cloud providers will be added in upcoming Super Protocol releases

### Supported GPU + CPU Configurations
### 3. Supported GPU + CPU Configurations

#### Hopper (H100, H200, H800)
#### 3.1 Hopper (H100, H200, H800)

- Google Cloud CVM with NVIDIA H100 and Intel TDX (A3 machine series)
- Single-GPU Servers (PCIe or NVL).
- **Single-GPU Servers (PCIe or NVL).**
**Supported GPU SKUs**: H100 PCIe, H800 PCIe, H100 NVL, H800 NVL, H200 NVL.
Paired with:
- Intel CPU: Starting from 5th Gen Intel Xeon (x5xx series — Emerald Rapids) with Intel TDX. [Memory configuration must comply with Intel TDX requirements.](https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/03/hardware_selection/)
- AMD CPU: Starting from AMD EPYC Genoa (9xx4 series) with AMD SEV-SNP.
- 4-GPU HGX Systems (SXM5 form-factor with NVLink)
- **4-GPU HGX Systems (SXM5 form-factor with NVLink).**
**Supported GPU SKUs**: HGX H100 4-GPU 64GB HBM2e (Partner Cooled), HGX H100 4-GPU 80GB HBM3 (Partner Cooled), HGX H100 4-GPU 94GB HBM2e (Partner Cooled).
Paired with:
- Intel CPU: Starting from 5th Gen Intel Xeon (x5xx series — Emerald Rapids) with Intel TDX. [Memory configuration must comply with Intel TDX requirements.](https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/03/hardware_selection/)
- AMD CPU: Starting from AMD EPYC Genoa (9xx4 series) with AMD SEV-SNP.
- 8-GPU HGX Systems (SXM5 form-factor with NVLink & 4 NVSwitches)
- **8-GPU HGX Systems (SXM5 form-factor with NVLink & 4 NVSwitches).**
**Supported GPU SKUs:** HGX H100 8-GPU 80GB (Air Cooled), HGX H100 8-GPU 96GB (Air Cooled), HGX H20 141GB HBM3e 8-GPU (Air Cooled), HGX H20A HBM3 96GB 8-GPU (Air Cooled), HGX H200 8-GPU 141GB (Air Cooled), HGX H800 8-GPU 80GB (Air Cooled), HGX H800 8-GPU 80GB (Partner Cooled).
Paired with:
- Intel CPU: Starting from 5th Gen Intel Xeon (x5xx series — Emerald Rapids) with Intel TDX. [Memory configuration must comply with Intel TDX requirements.](https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/03/hardware_selection/)
- AMD CPU: Starting from AMD EPYC Genoa (9xx4 series) with AMD SEV-SNP.

#### Blackwell (B200, B300, RTX PRO 6000)
#### 3.2 Blackwell (B200, B300, RTX PRO 6000)

- HGX B200 System. 8x Blackwell GPUs with NVLink 5 & 2 NVLink Switches
- **HGX B200 System.** 8x Blackwell GPUs with NVLink 5 & 2 NVLink Switches.
**Supported GPU SKUs:** HGX B200 8-GPU 180GB HBM3e (Air Cooled), HGX B200 8-GPU 180GB HBM3e (Partner Cooled), HGX B200-850 8-GPU 180GB HBM3e (Air Cooled).
Paired with:
- Intel CPU: Starting from 5th Gen Intel Xeon (x5xx series — Emerald Rapids) with Intel TDX. [Memory configuration must comply with Intel TDX requirements.](https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/03/hardware_selection/)
- AMD CPU: Starting from AMD EPYC Genoa (9xx4 series) with AMD SEV-SNP.
- HGX B300 System. 8x Blackwell Ultra GPUs with NVLink 5 & NVLink Switch
- **HGX B300 System.** 8x Blackwell Ultra GPUs with NVLink 5 & NVLink Switch.
**Supported GPU SKUs:** HGX B300 8-GPU 270GB HBM3e (Air Cooled).
Paired with:
- Intel CPU: Intel Xeon 6 (65xxP/65xxE and 67xxP/67xxE series — Granite Rapids (P-core) / Sierra Forest (E-core)) with Intel TDX. [Memory configuration must comply with Intel TDX requirements.](https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/03/hardware_selection/)
- AMD CPU: Starting from AMD EPYC Genoa (9xx4 series) with AMD SEV-SNP.
- RTX PRO 6000 Blackwell Server Edition (SE).
- **RTX PRO 6000 Blackwell Server Edition (SE).**
**Supported GPU SKUs:** RTX PRO 6000 Blackwell SE, RTX PRO 6000 Blackwell SE (Liquid Cooled).
Paired with:
- Intel CPU: Starting from 5th Gen Intel Xeon (x5xx series — Emerald Rapids) with Intel TDX. [Memory configuration must comply with Intel TDX requirements.](https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/03/hardware_selection/)
- AMD CPU: Starting from AMD EPYC Genoa (9xx4 series) with AMD SEV-SNP.

#### Expected
#### 3.3 Expected

Blackwell Architecture:

Expand All @@ -80,7 +80,7 @@ Rubin Architecture:

## Technical Insights

### Hopper and Blackwell in TEE Mode
### 1. Hopper and Blackwell in TEE Mode

Unlike Hopper-based DGX/HGX systems (H100/H200), Blackwell-based platforms (such as B200, B300, and RTX PRO 6000) introduce significantly greater flexibility for confidential GPU allocation and multi-GPU deployment in TEE mode.

Expand All @@ -90,7 +90,7 @@ By contrast, Hopper multi-GPU passthrough (PPCIe) does not encrypt GPU-to-GPU NV

Blackwell systems may also support mixed TEE and non-TEE virtual machines (VMs) on the same physical server, subject to configuration. However, for bare metal servers, where the entire machine is dedicated to a single user, this distinction may be operationally less relevant.

#### GPU & Memory Boundaries in TEE Mode
#### 1.1 GPU & Memory Boundaries in TEE Mode

The boundary of the Trusted Execution Environment — and therefore the boundary of accessible confidential memory — depends on two factors: the GPU architecture and the CC mode in use.

Expand All @@ -106,27 +106,27 @@ Blackwell also enables a second path for confidential large-model inference thro

For a practical example of how this unlocks workloads, see our [High-Performance Inference with vLLM on Super Protocol](https://superprotocol.com/resources/inference-with-vllm) article.

#### Hopper TEE Configurations
#### 1.2 Hopper TEE Configurations

Hopper GPUs support two TEE configurations:

1. Single GPU Passthrough (SPT CC): 1 GPU per Confidential VM (CVM). Multiple CVMs may run on the same node, each with 1 GPU.
2. Hopper Multiple GPU Passthrough (PPCIe): All GPUs within the physical platform are passed through to a single CVM. CPU–GPU traffic is encrypted via a bounce buffer, but GPU–GPU communication over NVLink or NVSwitch is not hardware-encrypted.
1. **Single GPU Passthrough (SPT CC)**: 1 GPU per Confidential VM (CVM). Multiple CVMs may run on the same node, each with 1 GPU.
2. **Hopper Multiple GPU Passthrough (Protected PCIe / PPCIe)**: All GPUs within the physical platform are passed through to a single CVM. CPU–GPU traffic is encrypted via a bounce buffer, but GPU–GPU communication over NVLink or NVSwitch is not hardware-encrypted.

As a result, Hopper architectures do not support secure shared confidential multi-GPU memory or partial GPU allocation within a larger multi-GPU platform.

#### Blackwell HGX/DGX TEE Configurations
#### 1.3 Blackwell HGX/DGX TEE Configurations

Blackwell GPUs support two TEE modes:

1. Single GPU Passthrough (SPT CC): 1 GPU per Confidential VM (CVM). Multiple CVMs may run on the same node, each with 1 GPU.
2. Blackwell Multiple GPU Passthrough (MPT CC): The R595 TRD1 driver enables MPT CC on supported HGX B200, B200-850, B200 (Partner Cooled), and B300 platforms. Within a supported multi-GPU platform, 1, 2, 4, or 8 GPUs may be assigned to a single Confidential VM (CVM). CPU–GPU traffic is encrypted via bounce buffers, while GPU–GPU communication within the same CVM occurs over hardware-encrypted NVLink connections. This enables granular GPU allocation and extends the trusted boundary across multiple GPUs within a CVM.
1. **Single GPU Passthrough (SPT CC)**: 1 GPU per Confidential VM (CVM). Multiple CVMs may run on the same node, each with 1 GPU.
2. **Blackwell Multiple GPU Passthrough (MPT CC)**: The R595 TRD1 driver enables MPT CC on supported HGX B200, B200-850, B200 (Partner Cooled), and B300 platforms. Within a supported multi-GPU platform, 1, 2, 4, or 8 GPUs may be assigned to a single Confidential VM (CVM). CPU–GPU traffic is encrypted via bounce buffers, while GPU–GPU communication within the same CVM occurs over hardware-encrypted NVLink connections. This enables granular GPU allocation and extends the trusted boundary across multiple GPUs within a CVM.

Blackwell systems may also support mixed TEE and non-TEE virtual machines on the same physical server, subject to configuration. However, for bare metal servers, where the entire machine is dedicated to a single user, this distinction may be operationally less relevant.

Refer to the official [NVIDIA Confidential Computing driver documentation](https://docs.nvidia.com/595trd1-trusted-computing-solutions-release-notes.pdf) for SKU-level compatibility and supported modes.

#### Blackwell RTX PRO 6000 TEE Configurations
#### 1.4 Blackwell RTX PRO 6000 TEE Configurations

TEE functionality is currently available only for the Server Edition, while the Workstation and Max-Q Editions are expected to add support in future releases. Super plans to validate this release in upcoming tests.

Expand All @@ -136,11 +136,11 @@ The release of RTX PRO 6000 Blackwell Server Edition (SE) makes TEE support much
2. RTX PRO 6000 systems do not include NVLink, allow 1 to 8 GPUs (1, 2, 3, 4, 5, 6, 7, or 8) to operate in a single TEE instance, provided all components meet Confidential Computing requirements. The efficiency of some configurations (e.g., 2 or 3 GPUs) is still to be evaluated.
3. TEE and non-TEE VMs are also expected to operate together within the same system, following specific setup instructions.

### NVIDIA Confidential Computing Driver Releases
### 2. NVIDIA Confidential Computing Driver Releases

The driver version primarily determines which TEE modes and GPU SKUs are available on a specific platform. The overview below details the features introduced by each release.

#### Hopper
#### 2.1 Hopper

Hopper TEE capabilities were first introduced in earlier driver releases (R550-R575) and have since been stabilized and expanded across subsequent updates. R595 TRD1 is the current General Availability (GA) release.

Expand All @@ -149,25 +149,25 @@ Hopper TEE capabilities were first introduced in earlier driver releases (R550-R
- [**R580 TRD1**](https://docs.nvidia.com/580trd1-trusted-computing-solutions-release-notes.pdf): Provided critical optimization for Hopper multi-CVM slicing (running multiple independent SPT CC environments per physical node).
- [**R550-R575**](https://docs.nvidia.com/nvtrust/index.html#nvidiatab-release-notes): Established the core infrastructure for Hopper TEEs. R550 TRD1 launched General Availability (GA) for Single GPU Passthrough (SPT CC), while R550 TRD3 initiated the Early Access phase for Protected PCIe (PPCIe). R575 TRD1 officially graduated Protected PCIe (PPCIe) to General Availability (GA) for monolithic 8-GPU scale-up architectures.

#### Blackwell HGX/DGX systems with NVLink
#### 2.2 Blackwell HGX/DGX systems with NVLink

Blackwell TEE capabilities represent a major architectural shift in the driver, moving from unencrypted NVLink (Hopper PPCIe) to hardware-encrypted NVLink (Blackwell MPT CC).

- **R595 TRD1 (Current GA)**: Expanded MPT CC and SPT CC support to new hardware, officially adding the HGX B300 and HGX B200 (Partner Cooled).
- **R590 TRD1**: Introduced Multiple GPU Passthrough (MPT CC), allowing 1, 2, 4, or 8 GPUs per CVM with hardware-encrypted NVLink. Initially limited to HGX B200 and B200-850 (Air Cooled (AC)).
- **R580 TRD1**: Introduced basic Blackwell TEE support with Single GPU Passthrough (SPT CC) only for HGX B200 and RTX PRO 6000 Server Edition.
- [**R595 TRD1 (Current GA)**](https://docs.nvidia.com/595trd1-trusted-computing-solutions-release-notes.pdf): Expanded MPT CC and SPT CC support to new hardware, officially adding the HGX B300 and HGX B200 (Partner Cooled).
- [**R590 TRD1**](https://docs.nvidia.com/590trd1-trusted-computing-solutions-release-notes.pdf): Introduced Multiple GPU Passthrough (MPT CC), allowing 1, 2, 4, or 8 GPUs per CVM with hardware-encrypted NVLink. Initially limited to HGX B200 and B200-850 (Air Cooled (AC)).
- [**R580 TRD1**](https://docs.nvidia.com/580trd1-trusted-computing-solutions-release-notes.pdf): Introduced basic Blackwell TEE support with Single GPU Passthrough (SPT CC) only for HGX B200 and RTX PRO 6000 Server Edition.

#### RTX PRO 6000 Blackwell
#### 2.3 RTX PRO 6000 Blackwell

RTX PRO 6000 TEE capabilities were introduced in recent driver releases and are gradually expanding to new hardware variants.

- **R595 TRD1 (Current GA)**: Expanded SPT CC support to new hardware, officially adding the RTX PRO 6000 Blackwell Server Edition, Liquid Cooled SKU.
- **R590 TRD1**: Continued SPT CC support for the RTX PRO 6000 Server Edition.
- **R580 TRD1**: Introduced basic TEE support with Single GPU Passthrough (SPT CC) exclusively for the RTX PRO 6000 Blackwell Server Edition.
- [**R595 TRD1 (Current GA)**](https://docs.nvidia.com/595trd1-trusted-computing-solutions-release-notes.pdf): Expanded SPT CC support to new hardware, officially adding the RTX PRO 6000 Blackwell Server Edition, Liquid Cooled SKU.
- [**R590 TRD1**](https://docs.nvidia.com/590trd1-trusted-computing-solutions-release-notes.pdf): Continued SPT CC support for the RTX PRO 6000 Server Edition.
- [**R580 TRD1**](https://docs.nvidia.com/580trd1-trusted-computing-solutions-release-notes.pdf): Introduced basic TEE support with Single GPU Passthrough (SPT CC) exclusively for the RTX PRO 6000 Blackwell Server Edition.

For current firmware and OS requirements per SKU, refer to the [NVIDIA Secure AI Compatibility Matrix](https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/secure-ai-compatibility-matrix/).

### Intel CPU Support for TEE
### 3. Intel CPU Support for TEE

**4th Gen Intel Xeon (Sapphire Rapids).** Intel supplied 4th Gen Xeon CPUs with TDX support exclusively to Google Cloud Platform, Microsoft Azure, IBM, and Alibaba. Only these cloud providers can offer instances with TEE-enabled 4th Gen Intel Xeon CPUs. All 4th Gen Intel Xeon CPUs from any other sources (cloud providers, OEMs, etc.) do not support Intel TDX.

Expand All @@ -177,7 +177,7 @@ However, Intel TDX support alone may not be sufficient for NVIDIA GPU TEE worklo

These compatibility requirements apply to both Intel TDX and AMD SEV-SNP based systems.

### Certification vs Functioning
### 4. Certification vs Functioning

**NVIDIA Secure AI Compatibility Matrix.** NVIDIA publishes the [Secure AI Compatibility Matrix](https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/secure-ai-compatibility-matrix/) — the official reference for supported combinations of NVIDIA GPUs, VBIOS versions, CUDA driver versions, and Confidential Computing modes (SPT CC, PPCIe, MPT CC). This matrix is the primary reference for GPU-level TEE support validation.

Expand All @@ -189,7 +189,7 @@ The Matrix covers GPU-level compatibility; it does not replace OEM system-level

**Cooling Variants:** Air Cooled (AC), Partner Cooled (PC), and Liquid Cooled (LC) versions of the same GPU platform are treated as distinct SKUs by NVIDIA and often do not share the same TEE support status or driver availability. Always verify the exact cooling variant in the Compatibility Matrix before making a decision.

### OEM Validation Practices
### 5. OEM Validation Practices

OEMs are not required to conduct separate testing for TEE mode. While OEMs may not officially validate systems for TEE configurations, they often limit available configurations to those more likely to function reliably, especially in scenarios involving TEE workloads.

Expand All @@ -199,7 +199,7 @@ Caution is advised with brand-new server models that have not yet been widely te

Even when a GPU SKU appears in the Secure AI Compatibility Matrix, testing the full configuration in a staging or pilot environment remains the most reliable way to confirm compatibility.

### System Configuration Matters
### 6. System Configuration Matters

Always consult directly with your OEM or hardware reseller to verify that your specific system configuration (including BIOS/Firmware versions, memory (DIMMs), and OS validation) fully meets the requirements for Intel TDX, AMD SEV-SNP, NVIDIA GPU TEE, and your intended confidential computing workloads.

Expand All @@ -210,11 +210,11 @@ Some cloud providers claimed to offer Intel TDX-enabled instances, but the requi
- As a result, the instance could not be used in TEE mode until it was replaced with the correctly configured memory setup.
- In some cases, it was not possible at all due to unavailable memory options, or required revising quote commits since the configuration fell outside standard offerings.

### Unsupported Configurations
### 7. Unsupported Configurations

The configurations below are not compatible for use in TEE mode in their current form.

#### GPU Configurations
#### 7.1 GPU Configurations

- (Temporarily) RTX PRO 6000 Blackwell Workstation and Max-Q Editions — until drivers are released. TEE functionality is currently available only for the RTX PRO 6000 Blackwell [Server Edition](https://docs.nvidia.com/580trd1-trusted-computing-solutions-release-notes.pdf), while the Workstation and Max-Q Editions are expected to add support in future releases.
- 72 GPU setups.
Expand All @@ -226,13 +226,13 @@ The configurations below are not compatible for use in TEE mode in their current
- [NVIDIA GB200 NVL72](https://www.nvidia.com/en-us/data-center/gb200-nvl72/): Combines 72 Blackwell GPUs with two Grace CPUs (non-TEE) → Incompatible with TEE mode.
- [HGX B100 Systems.](https://www.exxactcorp.com/blog/hpc/nvidia-blackwell-deployments-gb200-nvl72-dgx-hgx-b200-hgx-b100) Although some OEMs offer such systems, for planning and compatibility purposes, we focus only on officially announced products by NVIDIA, such as [HGX B200 and B300](https://www.nvidia.com/en-us/data-center/hgx/), which come with publicly available guides and reference documentation.

#### CPU Configurations (when paired with GPU TEEs)
#### 7.2 CPU Configurations (when paired with GPU TEEs)

- 4th Gen Intel Xeon Scalable CPUs (a.k.a. Sapphire Rapids) — if not offered by Google Cloud Platform, Microsoft Azure, IBM, or Alibaba — these CPUs do not support Intel TDX and cannot be used for confidential computing with GPU TEEs.
- AMD EPYC Milan (7xx3 series) with basic SEV-SNP support. Super Protocol supports only AMD SEV-SNP starting from AMD Genoa CPUs, which align with its security requirements for decentralized architectures.
- NVIDIA Grace CPUs: Current models lack TEE support, making them incompatible with GPU TEE requirements.

### More about NVIDIA's Confidential Computing mode
### 8. More about NVIDIA's Confidential Computing mode

1. [Secure AI Compatibility Matrix](https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/secure-ai-compatibility-matrix/): supported combinations of NVIDIA GPUs, VBIOS versions, CUDA driver versions, and Confidential Computing modes.
2. [Qualified System Catalog](https://marketplace.nvidia.com/en-us/enterprise/qualified-system-catalog/?page=1&limit=15): comprehensive list of GPU-accelerated systems available from the NVIDIA partner network.
Expand Down
Loading