Skip to content

build(base): bump ROCm baseline to 7.13#107

Open
MioYuuIH wants to merge 7 commits into
developfrom
bump-rocm-7.13.0
Open

build(base): bump ROCm baseline to 7.13#107
MioYuuIH wants to merge 7 commits into
developfrom
bump-rocm-7.13.0

Conversation

@MioYuuIH
Copy link
Copy Markdown
Contributor

@MioYuuIH MioYuuIH commented May 29, 2026

Summary

  • Bump the ROCm GPU base image baseline to ROCm 7.13.0 and the matching ROCm-enabled PyTorch wheel set.
  • Add the gfx1152 build target and keep generic gfx110x / gfx120x image targets supported.
  • Split ROCm SDK target selection from PyTorch wheel bucket selection so apt packages and wheel indexes can use their own target names.
  • Refresh ROCm host setup docs and ansible installer references for the ROCm 7.13 flow.

Implementation Notes

  • ROCm is installed from AMD's Ubuntu 24.04 apt repository using target-specific amdrocm-core-sdk7.13-<target> packages.
  • Generic CI targets use gfx110x / gfx120x for the ROCm SDK packages and gfx110X-all / gfx120X-all for PyTorch wheel buckets.
  • The /opt/rocm symlink fallback is idempotent so it does not fail when the 7.13 packages already create those links through update-alternatives.
  • Local Makefile builds default ROCM_SDK_TARGET to GPU_TARGET while CI passes explicit values from .github/build-config.json.

Validation

  • jq empty .github/build-config.json
  • YAML parse for .github/workflows/docker-build.yml
  • python3 -m unittest tests.installer.test_gpu tests.installer.test_overlay
  • Local Docker build for gfx110x with ROCM_SDK_TARGET=gfx110x and PYTORCH_WHL_TARGET=gfx110X-all
  • Local image smoke test: /opt/rocm/bin/hipcc exists and torch.__version__ is 2.9.1+rocm7.13.0
  • GitHub Actions Build Docker Images workflow dispatch run 26497377590 succeeded for base-gpu / all

@MioYuuIH MioYuuIH requested a review from KerwinTsaiii as a code owner May 29, 2026 02:12
@KerwinTsaiii
Copy link
Copy Markdown
Collaborator

Hi @MioYuuIH should we also update Pytorch version to the one with ROCm 7.13.0

python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ \
    "torch==2.11.0+rocm7.13.0" \
    "torchvision==0.26.0+rocm7.13.0" \
    "torchaudio==2.11.0+rocm7.13.0"

@MioYuuIH
Copy link
Copy Markdown
Contributor Author

MioYuuIH commented May 29, 2026

Hi @MioYuuIH should we also update Pytorch version to the one with ROCm 7.13.0

python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ \
    "torch==2.11.0+rocm7.13.0" \
    "torchvision==0.26.0+rocm7.13.0" \
    "torchaudio==2.11.0+rocm7.13.0"

# CI passes PYTORCH_WHL_TARGET explicitly via .github/build-config.json. For
# local builds driven by dockerfiles/Makefile or auplc-installer — which only
# set GPU_TARGET — we derive the wheel target here so ad-hoc
# `docker build --build-arg GPU_TARGET=gfx120x` and `make base-rocm
# GPU_TARGET=gfx120x` both Just Work.
RUN WHL_TARGET="${PYTORCH_WHL_TARGET}" && \
if [ -z "${WHL_TARGET}" ]; then \
case "${GPU_TARGET}" in \
gfx110x) WHL_TARGET="gfx110X-all" ;; \
gfx120x) WHL_TARGET="gfx120X-all" ;; \
*) WHL_TARGET="${GPU_TARGET}" ;; \
esac; \
fi && \
INDEX_URL="${PYTORCH_INDEX_URL:-https://repo.amd.com/rocm/whl/${WHL_TARGET}/}" && \
TORCH_SUFFIX="+rocm${ROCM_VERSION}" && \
echo "Installing ROCm PyTorch from ${INDEX_URL} (GPU_TARGET=${GPU_TARGET}, WHL_TARGET=${WHL_TARGET})" && \
python3 -m pip install --no-cache-dir \
--index-url "${INDEX_URL}" \
"torch==${PYTORCH_VERSION}${TORCH_SUFFIX}" \
"torchvision==${TORCHVISION_VERSION}${TORCH_SUFFIX}" \
"torchaudio==${TORCHAUDIO_VERSION}${TORCH_SUFFIX}" && \
SDK_INC=$(python3 -c "import _rocm_sdk_core, os; print(os.path.join(os.path.dirname(_rocm_sdk_core.__file__), 'include'))" 2>/dev/null) && \
if [ -n "$SDK_INC" ] && [ ! -e /opt/rocm/include/hip ]; then \
ln -s "${SDK_INC}/hip" /opt/rocm/include/hip; \
fi

We are not hardcoding the PyTorch wheel URL or ROCm target here.
The Dockerfile derives the wheel index from PYTORCH_WHL_TARGET / GPU_TARGET, and CI passes the target-specific values from .github/build-config.json. The package versions have also been updated to use the ROCm 7.13.0 wheel suffix:

  • torch==2.9.1+rocm7.13.0
  • torchvision==0.24.0+rocm7.13.0
  • torchaudio==2.9.0+rocm7.13.0
    So the install path is target-aware, while the framework versions stay aligned with the current course baseline.

@KerwinTsaiii
Copy link
Copy Markdown
Collaborator

@MioYuuIH Please resolve conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants