Skip to content

Add AMD ROCm (gfx942) support for the image→3D generation stack#72

Open
ZJLi2013 wants to merge 1 commit into
HorizonRobotics:masterfrom
ZJLi2013:amd_support
Open

Add AMD ROCm (gfx942) support for the image→3D generation stack#72
ZJLi2013 wants to merge 1 commit into
HorizonRobotics:masterfrom
ZJLi2013:amd_support

Conversation

@ZJLi2013

Copy link
Copy Markdown

Summary

Enable EmbodiedGen's image→3D generation to run on AMD GPUs (ROCm/HIP), by swapping the
CUDA-only libraries for verified ROCm builds plus two small runtime shims. All changes are
additive (new files under docker/); no existing CUDA code path is modified.

Verified end-to-end on an AMD Instinct MI300X: python -m embodied_gen.models.sam3d
(SAM3D backend, no GPT, no texture-bake) produces outputs/splat.ply (6.5 MB 3D Gaussian
Splat) from the bundled sample_00.jpg.

Changes (all new files)

  • docker/install_rocm.sh — one-shot ROCm install: requirements minus CUDA libs, numpy<2
    pin, the ROCm dependency swaps (table below), deploys the two shims as sitecustomize,
    and runs an import smoke (PASS/FAIL map).
  • docker/Dockerfile.rocm — full-generation ROCm image (rocm/pytorch:rocm6.4.3...2.6.0)
    that runs install_rocm.sh.
  • docker/spconv_rocm_compat.py — converts spconv KRSC checkpoints to the Native layout at
    load time (see Related issue).
  • docker/kaolin_stub.pysitecustomize bypass for the CUDA-only kaolin (used only in
    the texture-backprojection / mesh-IO stage; core geometry path only calls
    kaolin.utils.testing.check_tensor).
  • docker/README.rocm.md — user-facing run-through.

CUDA → ROCm dependency map

Upstream (CUDA) ROCm replacement
spconv-cu120/121 ZJLi2013/spconv_rocm (2.3.8+rocm1, source)
nvdiffrast ZJLi2013/nvdiffrast@rocm
gsplat amd_gsplat (pypi.amd.com/rocm-6.4.3; import name stays gsplat)
pytorch3d ROCm 6.4 / py3.12 prebuilt wheel
flash-attn FA2-Triton (FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE at install + runtime)
xformers not needed — SAM3D attention auto-selects sdpa
numpy (base = 2.x) pinned <2 (diffusers/transformers requirement)
kaolin (no ROCm wheel) sitecustomize stub (docker/kaolin_stub.py)
diff-gaussian-rasterization optional ('inria' GS backend); gsplat is the default

Tested on

  • GPU: AMD Instinct MI300X (gfx942)
  • ROCm: 6.4.3
  • PyTorch: 2.6.0 (+HIP 6.4)
  • Docker: rocm/pytorch:rocm6.4.3_ubuntu24.04_py3.12_pytorch_release_2.6.0

Results

  • outputs/splat.ply (6.5 MB) from apps/assets/example_image/sample_00.jpg
  • Running cost 28.9 s, Max VRAM 9.74 GB; attention on AOTriton SDPA

Notes / scope

  • Backward-compatible: only adds files under docker/; CUDA users are unaffected.

  • Out of scope (documented gaps, not regressions): texture-backprojection (kaolin is
    CUDA-only and stubbed), GPT quality-checkers (need an API key). Core image→3D
    (segmentation → SAM3D geometry + gaussian + mesh export) runs without them.

  • Optional follow-up (happy to include if desired): make the kaolin imports in
    embodied_gen/data/utils.py lazy/optional so the stub isn't needed.

  • Depends on / related: spconv KRSC checkpoint loading on ROCm — ZJLi2013/spconv_rocm#<pr>.
    Until merged, docker/spconv_rocm_compat.py provides the equivalent fix consumer-side.

  • License: this PR is for study/research purposes only and adds ROCm build/integration
    scripts; it ships no model weights. Any models used (e.g. SAM-3D-Objects, TRELLIS, Kolors,
    SD3.5, etc.) remain governed by their own respective licenses — please refer to each model's
    license before use.

Swap the CUDA-only generation stack for verified ROCm builds (spconv_rocm, nvdiffrast@rocm, amd_gsplat, pytorch3d ROCm wheel, FA2-Triton) plus two runtime shims: a kaolin sitecustomize bypass (texture-stage only) and a spconv KRSC->Native checkpoint-load bridge. All additive under docker/; CUDA paths unchanged. Verified e2e on AMD Instinct MI300X / ROCm 6.4.3 / torch 2.6: SAM3D image->3D produces splat.ply (28.9s, 9.74GB VRAM).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant