fix(tests): dynamic tolerance for cuDNN TF32 precision on Ampere+ GPUs#771
Open
Zhaoxian-Wu wants to merge 1 commit into
Open
fix(tests): dynamic tolerance for cuDNN TF32 precision on Ampere+ GPUs#771Zhaoxian-Wu wants to merge 1 commit into
Zhaoxian-Wu wants to merge 1 commit into
Conversation
c55b024 to
76a8e81
Compare
Collaborator
|
Thanks @Zhaoxian-Wu for your work! Alongside this one, look at the others ones and sync them with master, so the actions triggers again and should be working now since I added the fix to some of the lint errors were arising in the past. Also pass in the linting tool to address this: tests/helpers/testcases.py:56:53: E261 at least two spaces before inline comment
tests/helpers/testcases.py:58:55: E261 at least two spaces before inline comment |
76a8e81 to
0078e37
Compare
Author
|
Got it. I forgot to test the pycodestyle. This commit should be okay. Could you please trigger the test again to see whether it's correct? @PabloCarmona |
PabloCarmona
approved these changes
Jun 19, 2026
PabloCarmona
left a comment
Collaborator
There was a problem hiding this comment.
Seems ok for me, please update rest of PRs with the tests accordantly and we can start to finally merge them. I supposed this doesn't need to be merged once the other PRs has its test cases updated.
cuDNN defaults to TF32 Tensor Cores on Ampere+ GPUs (sm>=80), causing ~1e-3 divergence vs the RPU backend's FP32 CUBLAS path. The existing hard-coded tolerances (decimal=4/6) fail on H100 (RNN) and Blackwell (RNN + Conv3d). Add hardware-adaptive probes that measure the actual cuDNN-vs-non-cuDNN divergence at test session start, then derive tolerances from the measured value. CPU tests remain at decimal=6; CUDA tests relax only as much as the current GPU requires. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Zhaoxian Wu <wuzhaoxian97@gmail.com>
0078e37 to
dc10cef
Compare
Author
|
Sure, I've updated |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses issue #766, the CUDA test failures on Ampere+ GPUs caused by the precision mismatch between cuDNN's default TF32 Tensor Core path and the RPU backend's FP32 CUBLAS path.
Instead of globally disabling TF32, this follows the hardware-adaptive direction discussed in the issue:
assert_array_almost_equaldecimal tolerances from the measured divergencedecimal=6Testing
The affected CUDA tests now pass on my setup, including the previously failing RNN/LSTM and Conv3d cases on TF32-capable GPUs.
Hi @PabloCarmona, this PR helps me pass the full test suite locally. If the tests on the Git CLI pass as well, I will update the code in my other PRs accordingly. Thanks for your effort in maintaining the library.