build: require GCC>=13, CUDA>=12 (12 & 13), cuTENSOR>=2.0 with early actionable errors; harden CUDA finders#950
Conversation
- Require GCC >= 13 for GNU host compilers (C++20), failing early with a message pointing to -DCMAKE_CXX_COMPILER. Clang/AppleClang unaffected. - Resolve CUDAToolkit before enable_language() and require CUDA >= 12.0 (device-side C++20 floor), explicitly supporting 12.x and 13.x: < 12 is a FATAL_ERROR, > 13 warns. - Emit an actionable error when no usable nvcc is found, explaining that enable_language(CUDA) only consults CMAKE_CUDA_COMPILER / CUDACXX / PATH (not the located toolkit libraries). Skipped when a compiler is pinned. - Choose the default CUDA architecture list per toolkit version: CUDA 13.0 removed offline support for compute capability < 7.5, so Volta sm_70 is only emitted for CUDA 12.x. - Report CUDA via the modern CUDAToolkit_* variables (+ nvcc path); the legacy CUDA_VERSION_STRING / CUDA_TOOLKIT_ROOT_DIR are unset by find_package(CUDAToolkit) and printed blank (#949). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- FindCUTENSOR: remove dead lib/10.2 and lib/11 branches; derive the library subdir from the CUDA major version (lib/12, lib/13) instead of hardcoding 12; require cuTENSOR >= 2.0 by reading CUTENSOR_MAJOR/MINOR from the headers. - Derive CUTENSOR_FOUND / CUQUANTUM_FOUND from the actual find_library results via find_package_handle_standard_args instead of setting them TRUE unconditionally (which let a NOTFOUND silently pass the caller's REQUIRED check and link empty / "-NOTFOUND"). - FindCUQUANTUM: build CUQUANTUM_LIBRARIES conditionally (parity with FindCUTENSOR) so a missing lib does not appear as "...-NOTFOUND" on the link line. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…for tarballs - Update the CUDA dependency list: CUDA toolkit >= 12.0 (12.x/13.x) and cuTENSOR >= 2.0. - Add a prominent note that tarball installs of cuTENSOR/cuQuantum need both CUTENSOR_ROOT/CUQUANTUM_ROOT (build) and LD_LIBRARY_PATH (runtime), since tarball libraries are not registered with ldconfig (#948 workaround). Use $CUTENSOR_ROOT/lib (cuTENSOR 2.x layout), noting lib/<cuda-major> as legacy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the CMake configuration and documentation to enforce and support newer compiler and library requirements, specifically GCC >= 13, CUDA >= 12.0, and cuTENSOR >= 2.0. It also modernizes CUDA variable usage and improves package handling for CUQUANTUM and CUTENSOR. Feedback on these changes suggests several CMake improvements, such as avoiding variable dereferencing inside if() conditions, using CUDAToolkit_VERSION with standard version comparison operators for more robust checks, and registering the cuTENSOR version with find_package_handle_standard_args. Additionally, the reviewer noted a discrepancy in the documentation regarding the flat lib/ layout of cuTENSOR 2.x, which is not yet supported by the CMake script.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # included for CUDA 12.x. | ||
| if(NOT CMAKE_CUDA_ARCHITECTURES AND NOT DEFINED ENV{CUDAARCHS}) | ||
| set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86-real 89-real 90-real 90-virtual) | ||
| if(CUDAToolkit_VERSION_MAJOR GREATER_EQUAL 13) |
There was a problem hiding this comment.
Fixed in 10b1536f — if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 13.0).
Posted by Claude Code on behalf of @pcchen
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de94f3f8cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # toolkit can actually compile. find_package(CUDAToolkit) does not require the | ||
| # CUDA language to be enabled; the later find_package in CytnxBKNDCMakeLists | ||
| # re-uses this cached result. | ||
| find_package(CUDAToolkit QUIET) |
There was a problem hiding this comment.
Honor pinned CUDA compilers when finding the toolkit
This probes FindCUDAToolkit before the CUDA language is enabled, but cmake --help-module FindCUDAToolkit says the compiler directory is searched only after CUDA has been enabled; at this point it searches CUDAToolkit_ROOT/CUDA_PATH/PATH/defaults instead of CMAKE_CUDA_COMPILER or CUDACXX. A user with nvcc only under a non-default path who follows the new error text (-DCMAKE_CUDA_COMPILER=/opt/cuda-13/bin/nvcc or CUDACXX=...) will still hit this fatal path, and a user with another CUDA on PATH can have the version check/link libraries come from the wrong toolkit before enable_language(CUDA) honors the pinned compiler.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed — find_package(CUDAToolkit) runs before enable_language(CUDA), so it consults CUDAToolkit_ROOT/CUDA_PATH/PATH rather than CMAKE_CUDA_COMPILER/CUDACXX, and a pinned-but-not-on-PATH nvcc (or a stray CUDA on PATH) can drive the early check from the wrong toolkit. The intended fix is to derive CUDAToolkit_ROOT from a pinned CMAKE_CUDA_COMPILER/CUDACXX before the probe. Deferring this to a follow-up PR since it needs a Linux box with multiple CUDA toolkits to verify; tracked there.
Posted by Claude Code on behalf of @pcchen
There was a problem hiding this comment.
If we defer this issue to following PR, this PR fixed the problem in the wrong way. In this change, setting CMAKE_CUDA_COMPILER to a path other than the path found in find_package(CUDAToolkit) may cause the version of nvcc and cuda toolkit different, and the build may fail. Suggest pattern is not to find cuda toolkit before enabling cuda
# Record user intent before CUDA is enabled.
set(_USER_SET_CUDA_ARCHITECTURES FALSE)
if(DEFINED CMAKE_CUDA_ARCHITECTURES)
set(_USER_SET_CUDA_ARCHITECTURES TRUE)
endif()
enable_language(CUDA)
find_package(CUDAToolkit REQUIRED)
if(NOT _USER_SET_CUDA_ARCHITECTURES)
if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 13.0)
set(CMAKE_CUDA_ARCHITECTURES
75-real 80-real 86-real 89-real 90-real 90-virtual)
else()
set(CMAKE_CUDA_ARCHITECTURES
70-real 75-real 80-real 86-real 89-real 90-real 90-virtual)
endif()
endif()There was a problem hiding this comment.
CMAKE_CUDA_ARCHITECTURES all-major cannot be used here because that will include 50 and 60 for cuda 12.x. 50 and 60 are not supported by our required cuTENSOR and cuQuantum.
|
|
||
| .. code-block:: shell | ||
|
|
||
| $export LD_LIBRARY_PATH=$CUTENSOR_ROOT/lib:$CUQUANTUM_ROOT/lib:$LD_LIBRARY_PATH |
There was a problem hiding this comment.
Match cuTENSOR runtime path to the finder
The documented runtime workaround does not match the library directory that this commit's finder accepts: FindCUTENSOR.cmake now constructs ${CUTENSOR_ROOT}/lib/${CUDAToolkit_VERSION_MAJOR} for CUDA 12/13. For a tarball install that is found by the build, exporting only $CUTENSOR_ROOT/lib leaves the actual libcutensor.so directory out of LD_LIBRARY_PATH, so import cytnx can still fail with the loader error this note is meant to prevent; document the versioned subdirectory or reuse the detected library dir.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 10b1536f, together with the finder change. Instead of documenting a versioned-subdir caveat, FindCUTENSOR now searches the flat lib/ directly and derives CUTENSOR_LIBRARY_DIRS from the located library, so for a 2.x tarball the documented $CUTENSOR_ROOT/lib matches the directory the build actually used.
Posted by Claude Code on behalf of @pcchen
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #950 +/- ##
==========================================
+ Coverage 29.87% 29.99% +0.12%
==========================================
Files 240 238 -2
Lines 35425 35421 -4
Branches 14729 14729
==========================================
+ Hits 10584 10626 +42
+ Misses 17593 17544 -49
- Partials 7248 7251 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
…installed Add a troubleshooting subsection covering the two things to pin when multiple CUDA toolkits are present: the build compiler (-DCMAKE_CUDA_COMPILER / CUDACXX / PATH, with the pip/scikit-build form) and the runtime libraries (LD_LIBRARY_PATH vs the ldconfig/apt copy). Notes that the configure output reports the resolved toolkit and that a stale CMakeCache.txt must be wiped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Added a docs section ( A new "Choosing between CUDA 12 and CUDA 13 when both are installed" subsection under Build troubleshooting in
It ties into the CUDA-version reporting added in this PR and the tarball Posted by Claude Code on behalf of @pektiong |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe7b181bf1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if(${CUDAToolkit_VERSION_MAJOR} GREATER_EQUAL 12) | ||
| set(CUTNLIB_DIR "${CUTNLIB_DIR}${CUDAToolkit_VERSION_MAJOR}") |
There was a problem hiding this comment.
Search the cuTENSOR 2.x flat lib directory
When CUTENSOR_ROOT points at a cuTENSOR 2.x install (the version this module now requires), the libraries live directly under $CUTENSOR_ROOT/lib as documented in docs/source/adv_install.rst, but this appends the CUDA major and the later find_library(... NO_DEFAULT_PATH) only checks $CUTENSOR_ROOT/lib/12 or lib/13. Fresh evidence beyond the prior runtime-path note is that a fake 2.7 header with lib/libcutensor.so leaves CUTENSOR_LIB as NOTFOUND, so USE_CUTENSOR=ON cannot configure for the supported 2.x layout unless users add a nonstandard versioned symlink.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — and this outweighs the "out of scope" note in the description: once the module hard-requires cuTENSOR >= 2.0, it has to locate the 2.x flat lib/. Fixed in 10b1536f by giving find_library both suffixes (lib and lib/${CUDAToolkit_VERSION_MAJOR}), so the 2.x flat layout and the legacy versioned layout both resolve for CUDA 12/13 without a nonstandard symlink. CUTENSOR_LIBRARY_DIRS is now taken from the directory of the located library. The apt multiarch path remains in #946.
Posted by Claude Code on behalf of @pcchen
| # 12.0 floor comes from device-side C++20 (nvcc gained -std=c++20 in CUDA | ||
| # 12.0; 11.x tops out at C++17), matching CMAKE_CUDA_STANDARD 20. Fail early | ||
| # with an actionable message instead of a cryptic later C++20/arch error. | ||
| if(CUDAToolkit_VERSION_MAJOR LESS 12) |
There was a problem hiding this comment.
Gate CUDA 12.x against the required GCC version
For GNU CUDA builds with CUDA 12.0-12.3 and GCC 13 selected by the new GNU compiler gate, this accepts the toolkit and then enable_language(CUDA) can fail because those CUDA releases do not support GCC 13 as an nvcc host compiler (CUDA 12.3 lists GCC 6.x-12.2, while CUDA 12.4 lists 6.x-13.2). Either raise the GNU/CUDA floor to 12.4 or reject the unsupported host-compiler combination so the advertised CUDA >=12.0 path does not pass this check and fail later.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid: GCC 13 + CUDA 12.0-12.3 passes both gates, but nvcc caps host support at GCC 12.x until CUDA 12.4. Deferring the combined gate to a follow-up PR, because it is bound up with revisiting the GCC-13 floor itself — that floor is driven solely by two std::format call sites (pybind/symmetry_py.cpp, pybind/unitensor_py.cpp); switching them to {fmt} or plain formatting would lower the floor to GCC 12 and change this calculus. Tracked for the follow-up.
Posted by Claude Code on behalf of @pcchen
…rsion checks Addresses inline review feedback on PR #950: - FindCUTENSOR: search both the cuTENSOR 2.x flat lib/ and the legacy lib/<cuda-major> layouts (find_library PATH_SUFFIXES), so the required 2.x layout actually resolves; derive CUTENSOR_LIBRARY_DIRS from the located library; reference CUDAToolkit_VERSION_MAJOR by name (no in-if() dereference); register VERSION_VAR CUTENSOR_VERSION with find_package_handle_standard_args. - CMakeLists: use CUDAToolkit_VERSION with VERSION_LESS/VERSION_GREATER_EQUAL instead of comparing the major component as an integer. - docs(adv_install): note FindCUTENSOR searches both layouts so the build-time and runtime cuTENSOR paths agree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| For cuTENSOR 2.x tarballs the libraries sit directly under ``lib/``, so | ||
| the path above is sufficient at both build and runtime. Older cuTENSOR | ||
| 1.x tarballs instead use a per-CUDA subdirectory (``lib/<cuda-major>``, | ||
| e.g. ``lib/12``); on that legacy layout point ``LD_LIBRARY_PATH`` at the | ||
| subdirectory. Cytnx's ``FindCUTENSOR`` searches both layouts at build | ||
| time, so the runtime path matches whichever one was found. If | ||
| ``LD_LIBRARY_PATH`` is not set | ||
| up, importing/running Cytnx fails with ``error while loading shared | ||
| libraries: libcutensor.so... cannot open shared object file``, or | ||
| silently binds to a mismatched system copy of the library if one is | ||
| present. Add these ``export`` lines to your shell profile (e.g. | ||
| ``~/.bashrc``) to make them persistent. |
There was a problem hiding this comment.
Not for all 2.x. cuTENSOR start to separate the package for different CUDA version since 2.3. And we don't support cuTENSOR < 2.0 and cuQuantum < 24.0. See #447.
I will not recommend users export LD_LIBRARY_PATH to ~/.bashrc. This will pollute the environment system-wide just for running cytnx built for local. Actually, having multiple version of the same libraries without isolated environment is always easy to get trouble. We can tell users how LD_LIBRARY_PATH will be used like cupy and jax, but we may also have to consider to point the user or developer who requires LD_LIBRARY_PATH at this moment to conda environment.
| .. code-block:: shell | ||
|
|
||
| # Use CUDA 13: | ||
| $cmake -S . -B build -DUSE_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13/bin/nvcc | ||
|
|
||
| # ...or use CUDA 12: | ||
| $cmake -S . -B build -DUSE_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12/bin/nvcc | ||
|
|
||
| Equivalent alternatives are ``export CUDACXX=/usr/local/cuda-13/bin/nvcc`` or | ||
| putting the desired ``bin`` directory first on ``PATH`` | ||
| (``export PATH=/usr/local/cuda-13/bin:$PATH``). For the ``pip`` build, pass it | ||
| through scikit-build-core: | ||
|
|
||
| .. code-block:: shell | ||
|
|
||
| $pip install . --config-settings=cmake.args="-DUSE_CUDA=ON;-DCMAKE_CUDA_COMPILER=/usr/local/cuda-13/bin/nvcc" |
| **2. Runtime -- match the shared libraries.** Selecting the compiler does not | ||
| control which CUDA runtime libraries are loaded at run time. The dynamic loader | ||
| resolves ``libcudart.so.<major>`` (and cuTENSOR/cuQuantum, etc.) via | ||
| ``LD_LIBRARY_PATH``, then the ``ldconfig`` cache. If a different major version is | ||
| registered with ``ldconfig`` (commonly the ``apt`` copy under | ||
| ``/usr/lib/x86_64-linux-gnu``), it can be loaded instead of the toolkit you | ||
| built against, causing version-skew crashes. To force the matching runtime, put | ||
| its library directory first: |
There was a problem hiding this comment.
This content seems duplicated with the section of cuTENSOR and cuQuantum.
| if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 13) | ||
| message(FATAL_ERROR | ||
| "GCC >= 13 is required (C++20 support), but found GCC " | ||
| "${CMAKE_CXX_COMPILER_VERSION} at ${CMAKE_CXX_COMPILER}.\n" | ||
| "Install GCC 13+ and point CMake at it, e.g. " | ||
| "-DCMAKE_CXX_COMPILER=g++-13 (and -DCMAKE_C_COMPILER=gcc-13), " | ||
| "or set the CXX/CC environment variables.") | ||
| endif() |
There was a problem hiding this comment.
Do we really need this guard? We have had CMAKE_CXX_STANDARD_REQUIRED ON for guarding the compiler version.
| # toolkit can actually compile. find_package(CUDAToolkit) does not require the | ||
| # CUDA language to be enabled; the later find_package in CytnxBKNDCMakeLists | ||
| # re-uses this cached result. | ||
| find_package(CUDAToolkit QUIET) |
There was a problem hiding this comment.
If we defer this issue to following PR, this PR fixed the problem in the wrong way. In this change, setting CMAKE_CUDA_COMPILER to a path other than the path found in find_package(CUDAToolkit) may cause the version of nvcc and cuda toolkit different, and the build may fail. Suggest pattern is not to find cuda toolkit before enabling cuda
# Record user intent before CUDA is enabled.
set(_USER_SET_CUDA_ARCHITECTURES FALSE)
if(DEFINED CMAKE_CUDA_ARCHITECTURES)
set(_USER_SET_CUDA_ARCHITECTURES TRUE)
endif()
enable_language(CUDA)
find_package(CUDAToolkit REQUIRED)
if(NOT _USER_SET_CUDA_ARCHITECTURES)
if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 13.0)
set(CMAKE_CUDA_ARCHITECTURES
75-real 80-real 86-real 89-real 90-real 90-virtual)
else()
set(CMAKE_CUDA_ARCHITECTURES
70-real 75-real 80-real 86-real 89-real 90-real 90-virtual)
endif()
endif()| # toolkit can actually compile. find_package(CUDAToolkit) does not require the | ||
| # CUDA language to be enabled; the later find_package in CytnxBKNDCMakeLists | ||
| # re-uses this cached result. | ||
| find_package(CUDAToolkit QUIET) |
There was a problem hiding this comment.
CMAKE_CUDA_ARCHITECTURES all-major cannot be used here because that will include 50 and 60 for cuda 12.x. 50 and 60 are not supported by our required cuTENSOR and cuQuantum.
| elseif(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 14.0) | ||
| message(WARNING | ||
| "CUDA Toolkit ${CUDAToolkit_VERSION} is newer than the versions Cytnx is " | ||
| "tested against (12.x and 13.x). The build will proceed; please report " | ||
| "any issues.") | ||
| endif() |
There was a problem hiding this comment.
For packages supporting cuda, there are many configurations. It's hard to test on all configurations as the discussion in #583 and #584. Some packages don't explicitly set the range of supported system dependencies. They just list the tested configuration in the document. This reduces the responsibility of package, makes maintenance easier and let the users who want to build from source take the responsibility to solve the configuration problem.
We may support and build with CUDA 14 in the future, so guarding with the maximum version is not needed.
Summary
Makes Cytnx's CUDA build requirements explicit and self-documenting, so
unsupported toolchains fail early with actionable messages instead of cryptic
downstream errors. Also fixes correctness bugs in the cuTENSOR/cuQuantum
finders and stale CUDA configure diagnostics.
This addresses the configure-/build-time portions of #946 and #949, and the
documentation half of #948. (The deeper #948 rpath change and the #946
candidate-suffix layout search are intentionally left as follow-ups -- see
"Out of scope".)
Changes
Compiler / toolkit requirements (new, fail-early)
pointing at
-DCMAKE_CXX_COMPILER. Clang/AppleClang are unaffected.now resolved (
find_package(CUDAToolkit)) beforeenable_language(CUDA)so the version is known early enough to gate and to drive architecture
selection.
< 12is aFATAL_ERROR;> 13warns (untested).offline compilation for compute capability
< 7.5, so Voltasm_70is nowemitted only for CUDA 12.x.
CUTENSOR_MAJOR/MINORfrom thecuTENSOR headers; the 1.x API is rejected.
Better errors / diagnostics
enable_language(CUDA),explaining that it only consults
CMAKE_CUDA_COMPILER/CUDACXX/PATH(not the located toolkit libraries) and listing the fixes. Skipped when the
user pins a compiler explicitly.
CUDAToolkit_*variables (
CUDAToolkit_VERSION,CUDAToolkit_BIN_DIR) plusCMAKE_CUDA_COMPILER. The legacyCUDA_VERSION_STRING/CUDA_TOOLKIT_ROOT_DIRare not set byfind_package(CUDAToolkit)and wereprinting blank.
Finder correctness (#946)
lib/10.2andlib/11branches; thelibrary subdir now tracks the CUDA major version (
lib/12,lib/13) insteadof a hardcoded
lib/12.CUTENSOR_FOUND/CUQUANTUM_FOUNDare now derived from the actualfind_libraryresults viafind_package_handle_standard_args, instead ofbeing set
TRUEunconditionally (which let aNOTFOUNDsilently pass thecaller's
REQUIREDcheck and link empty /-NOTFOUND).CUQUANTUM_LIBRARIESconditionally (parity withFindCUTENSOR).
Docs
adv_install.rst: dependency list updated (CUDA >= 12.0; cuTENSOR >= 2.0),and a prominent note that tarball installs of cuTENSOR/cuQuantum need both
CUTENSOR_ROOT/CUQUANTUM_ROOT(build) andLD_LIBRARY_PATH(runtime),since tarball libs aren't registered with
ldconfig([build] Installed CUDA extension is not rpath-pinned to the build toolkit; runtime loads system/apt CUDA runtime #948 workaround).Related issues
subdir, honest
*_FOUND; the 2.x flatlib/+ apt multiarch search remains).deferred).
Testing
cmake -S . -B build -DUSE_CUDA=OFFconfigures and generates cleanly(AppleClang; GCC check correctly skipped).
1.7 -> fail).
not executed (no CUDA/cuTENSOR/cuQuantum environment available); they are
validated by inspection. Reviewers with a CUDA box, please sanity-check a
-DUSE_CUDA=ON(and-DUSE_CUTENSOR=ON/-DUSE_CUQUANTUM=ON) configure.Out of scope (follow-ups)
(
INSTALL_RPATH_USE_LINK_PATH/--disable-new-dtags) -- changesinstall/runtime behavior, needs a Linux+CUDA test.
lib/and apt multiarch layouts.CytnxBKNDCMakeLists.cmakelibrary lines (cusolver/curand/cublas/cudart)still use legacy
CUDA_*_LIBRARYvars feedinglinkflags.tmp.🤖 Generated with Claude Code