Add AMD ROCm (gfx942) support for the image→3D generation stack by ZJLi2013 · Pull Request #72 · HorizonRobotics/EmbodiedGen

ZJLi2013 · 2026-06-18T06:30:13Z

Summary

Enable EmbodiedGen's image→3D generation to run on AMD GPUs (ROCm/HIP), by swapping the
CUDA-only libraries for verified ROCm builds plus two small runtime shims. All changes are
additive (new files under docker/); no existing CUDA code path is modified.

Verified end-to-end on an AMD Instinct MI300X: python -m embodied_gen.models.sam3d
(SAM3D backend, no GPT, no texture-bake) produces outputs/splat.ply (6.5 MB 3D Gaussian
Splat) from the bundled sample_00.jpg.

Changes (all new files)

docker/install_rocm.sh — one-shot ROCm install: requirements minus CUDA libs, numpy<2
pin, the ROCm dependency swaps (table below), deploys the two shims as sitecustomize,
and runs an import smoke (PASS/FAIL map).
docker/Dockerfile.rocm — full-generation ROCm image (rocm/pytorch:rocm6.4.3...2.6.0)
that runs install_rocm.sh.
docker/spconv_rocm_compat.py — converts spconv KRSC checkpoints to the Native layout at
load time (see Related issue).
docker/kaolin_stub.py — sitecustomize bypass for the CUDA-only kaolin (used only in
the texture-backprojection / mesh-IO stage; core geometry path only calls
kaolin.utils.testing.check_tensor).
docker/README.rocm.md — user-facing run-through.

CUDA → ROCm dependency map

Upstream (CUDA)	ROCm replacement
`spconv-cu120/121`	`ZJLi2013/spconv_rocm` (2.3.8+rocm1, source)
`nvdiffrast`	`ZJLi2013/nvdiffrast@rocm`
`gsplat`	`amd_gsplat` (pypi.amd.com/rocm-6.4.3; import name stays `gsplat`)
`pytorch3d`	ROCm 6.4 / py3.12 prebuilt wheel
`flash-attn`	FA2-Triton (`FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE` at install + runtime)
`xformers`	not needed — SAM3D attention auto-selects `sdpa`
`numpy` (base = 2.x)	pinned `<2` (diffusers/transformers requirement)
`kaolin` (no ROCm wheel)	`sitecustomize` stub (`docker/kaolin_stub.py`)
`diff-gaussian-rasterization`	optional ('inria' GS backend); `gsplat` is the default

Tested on

GPU: AMD Instinct MI300X (gfx942)
ROCm: 6.4.3
PyTorch: 2.6.0 (+HIP 6.4)
Docker: rocm/pytorch:rocm6.4.3_ubuntu24.04_py3.12_pytorch_release_2.6.0

Results

outputs/splat.ply (6.5 MB) from apps/assets/example_image/sample_00.jpg
Running cost 28.9 s, Max VRAM 9.74 GB; attention on AOTriton SDPA

Notes / scope

Backward-compatible: only adds files under docker/; CUDA users are unaffected.
Out of scope (documented gaps, not regressions): texture-backprojection (kaolin is
CUDA-only and stubbed), GPT quality-checkers (need an API key). Core image→3D
(segmentation → SAM3D geometry + gaussian + mesh export) runs without them.
Optional follow-up (happy to include if desired): make the kaolin imports in
embodied_gen/data/utils.py lazy/optional so the stub isn't needed.
Depends on / related: spconv KRSC checkpoint loading on ROCm — ZJLi2013/spconv_rocm#<pr>.
Until merged, docker/spconv_rocm_compat.py provides the equivalent fix consumer-side.
License: this PR is for study/research purposes only and adds ROCm build/integration
scripts; it ships no model weights. Any models used (e.g. SAM-3D-Objects, TRELLIS, Kolors,
SD3.5, etc.) remain governed by their own respective licenses — please refer to each model's
license before use.

Swap the CUDA-only generation stack for verified ROCm builds (spconv_rocm, nvdiffrast@rocm, amd_gsplat, pytorch3d ROCm wheel, FA2-Triton) plus two runtime shims: a kaolin sitecustomize bypass (texture-stage only) and a spconv KRSC->Native checkpoint-load bridge. All additive under docker/; CUDA paths unchanged. Verified e2e on AMD Instinct MI300X / ROCm 6.4.3 / torch 2.6: SAM3D image->3D produces splat.ply (28.9s, 9.74GB VRAM).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMD ROCm (gfx942) support for the image→3D generation stack#72

Add AMD ROCm (gfx942) support for the image→3D generation stack#72
ZJLi2013 wants to merge 1 commit into
HorizonRobotics:masterfrom
ZJLi2013:amd_support

ZJLi2013 commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZJLi2013 commented Jun 18, 2026

Summary

Changes (all new files)

CUDA → ROCm dependency map

Tested on

Results

Notes / scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant