Skip to content

pytorch主干nightly集成验证#141

Open
kerer-ai wants to merge 36 commits into
Ascend:masterfrom
kerer-ai:master_pr_compile
Open

pytorch主干nightly集成验证#141
kerer-ai wants to merge 36 commits into
Ascend:masterfrom
kerer-ai:master_pr_compile

Conversation

@kerer-ai
Copy link
Copy Markdown
Collaborator

@kerer-ai kerer-ai commented Jun 2, 2026

No description provided.

wangsike and others added 8 commits June 1, 2026 17:44
…h upstream

- Add builder Dockerfiles (manylinux2_28, x86_64/aarch64) for torch_npu wheel builds
- Add test Dockerfiles (ubuntu:22.04 + Miniforge3 conda py_3.10, aligned with upstream CUDA 13.0 image)
- Switch test images from system Python to conda + named env pattern
- Align requirements-test.txt with upstream requirements-ci.txt (py3.10/jammy profile)
- Switch to PyTorch nightly index for latest daily builds
- Add docker_build.sh with explicit case-statement tag mapping
- Add .ci/pytorch/ build scripts (common.sh, build_pytorch.sh, build_torch_npu.sh, build.sh, integration_verify.sh)
- Update _build.yml with real build commands replacing simulated placeholders
- Add build-docker-images.yml workflow: PR/push triggers build+push all 8 images, aarch64 on ARM native runners

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…function

GHA expressions do not support 'split'. matrix-prep now outputs
a JSON array directly, consumed by fromJSON() in the build matrix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…6_64

- Remove builder Dockerfiles and requirements-builder.txt
- Remove test/Dockerfile.x86_64
- Simplify docker_build.sh to only aarch64 A2/A3
- Simplify build-docker-images.yml to aarch64-only, ubuntu-24.04-arm runner
- Update _build.yml default image tag to aarch64
- Update README

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Docker image no longer installs PyTorch (built separately at CI runtime)
- Move torchvision, torch_geometric, torch-scatter to requirements-post.txt
- requirements-post.txt installed after PyTorch + torch_npu are built
- Add post-build dependency step to build.sh and _build.yml
- Dockerfile now only installs base requirements (no torch needed)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace generic DOCKER_REGISTRY push with quay.io/kerer/pytorch using
docker/login-action@v3. Upload each built image path as an artifact, and
add a summary job that prints image names and docker pull commands in a
Markdown table via GITHUB_STEP_SUMMARY.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add test-build-trigger.yml (PR trigger on master) and _build.yml
(reusable workflow) that checkout upstream PyTorch main HEAD, build
from source, then checkout downstream torch_npu master and verify
both packages import successfully.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

- Add ccache to accelerate C/C++ compilation across runs
- Add pip cache via actions/cache@v4 for faster dependency installs
- Add BUILD_WITHOUT_SHA=1 env for more cacheable builds
- Capture build logs to /tmp and upload as artifacts for debugging
- Add proper exit code handling in build steps
- Enhance build summary with ccache statistics

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

…erfile

- fetch-depth: 0 prevents shallow clone conflicts with recursive submodules
- Install ccache in Dockerfile so future images have it pre-installed

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Setting CC="ccache gcc" causes ccache to be invoked directly when CMake
builds .S assembly files (qnnpack confu loses the "gcc" part), making
ccache reject -D preprocessor flags as invalid options.

CMAKE_C_COMPILER_LAUNCHER tells CMake to prefix the compiler natively,
which correctly handles C, C++, and assembly compilation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

When the source-built PyTorch is newer than what torch_npu master expects,
the code generator adds TORCH_FEATURE_VERSION guards to C shim headers
that differ from checked-in files, causing a RuntimeError. Run codegen
with --update_aoti_c_shim before ci/build.sh to sync headers first.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

When the op-plugin submodule lacks the config directory for the
source-built PyTorch version (e.g. no v2r13 for 2.13.0a0), fall back
to the latest available config and symlink it so ci/build.sh can also
resolve the expected path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

The manual gen_backend_stubs invocation required constructing version-
specific config paths (v2r13 etc.) which break when the op-plugin submodule
structure changes. Instead, patch generate_code.sh in-place with sed to
add the --update_aoti_c_shim flag, then let ci/build.sh run the full
codegen pipeline which handles all version mapping correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Don't touch source code in CI. Let the build fail naturally with full
error output for diagnosis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Allow parallel runs during testing phase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

…orch 2.11+ compat

Upstream PR pytorch/pytorch#181782 changed SavedVariable::unpack() to accept
c10::intrusive_ptr<Node> instead of std::shared_ptr<Node>. Update the codegen
template to match, fixing VariableType_0.cpp.o and python_functions_*.cpp.o
build failures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

…+ compat

Same upstream PR pytorch/pytorch#181782 removed autograd::deleteNode and
changed set_history() to accept intrusive_ptr<Node>. Replace shared_ptr
with make_intrusive for WarnNotImplemented grad_fn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

wangsike and others added 2 commits June 4, 2026 14:07
… PyTorch 2.11+ compat

set_history() now expects c10::intrusive_ptr<Node>. Replace shared_ptr
with intrusive_ptr for Identity grad_fn, same upstream change #181782.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…yTorch 2.11+ compat

Upstream autograd::impl::grad_accumulator() now returns intrusive_ptr<Node>
instead of shared_ptr<Node>. Update member types in reducer.hpp to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

The torch_npu._C shared library depends on libhccl.so from CANN, which
is not on the default library path. Source set_env.sh before importing.

Replace single "Verify installation" step with two steps:
1. Verify NPU device (source env, npu-smi info)
2. Verify NPU availability (source env, import torch + torch_npu)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Two changes:

1. Wheel artifact upload:
   - PyTorch build now uses 'pip wheel' + 'pip install' instead of editable
     install, producing a wheel at /tmp/wheels/
   - torch_npu wheel is copied to /tmp/wheels/ after build
   - Both wheels uploaded as artifact 'wheels-<run>' with 7-day retention

2. Checkout speed optimization:
   - Submodule init is deferred to a separate step with --depth=1, avoiding
     full history clones of huge submodules (third_party/*)
   - Combined with --jobs=$(nproc) for parallel submodule clone

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

chmod -R 777 ~/.cache recursively walks entire ccache directory tree
(hundreds of thousands of cached object files), taking ~7 minutes on
self-hosted runners with persistent home directories.

Replace with non-recursive chmod on directories only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Post Cache ccache step uploads the entire ccache directory (~/.cache/ccache)
back to GitHub Cache, taking ~8min for a full 10G cache over limited uplink.

Changes:
- Reduce ccache max size from 10G to 5G (halves upload volume)
- Add CCACHE_COMPRESSLEVEL=6 (zstd level 6, was default 1)
- Add pre-summary ccache cleanup step (ccache -c) to prune old entries
  before the post-job cache upload kicks in
- Report ccache disk usage after cleanup for visibility

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

fetch-depth: 0 triggers a full clone (all branches, all tags, all history).
PyTorch has thousands of branches and decades of commits, taking ~3min just
for git fetch. fetch-depth: 1 fetches only the single target SHA, reducing
checkout to ~15s.

Also apply the same submodule optimization (depth=1, parallel) to the
torch_npu checkout.

Expected improvement:
- PyTorch checkout: ~3min → ~15s
- torch_npu checkout: ~30s → ~10s

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

pip wheel + pip install puts torch in site-packages, but Python in the
pytorch/ source dir imports the local torch/ directory (which contains
source-only _C folder, not compiled .so). This fails with:
"Failed to load PyTorch C extensions"

Fix: cd /tmp before import, then cd back.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Cache restore (ccache 5.5GB) took ~7min via network, while full recompile
takes <5min. Cache save/upload also cost another ~8min post-job.

Remove: pip cache, ccache cache, ccache install, ccache configuration in
both build steps, ccache cleanup, and ccache summary output.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Include the dynamically generated AOTI C shim header in the build
artifact upload for debugging AOTInductor compatibility issues.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Now that c_shim_npu.h is kept in sync with the repo, the CI runs in
validation mode — it regenerates the header and fails if it differs
from the checked-in version, instead of silently overwriting it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Update the checked-in AOTI C shim header to match the version generated
by the current PyTorch nightly's torchgen fallback ops list.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Sync with latest upstream Ascend/pytorch master.
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

kerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants