Skip to content

fix(tests): dynamic tolerance for cuDNN TF32 precision on Ampere+ GPUs#771

Open
Zhaoxian-Wu wants to merge 1 commit into
IBM:masterfrom
Zhaoxian-Wu:fix/test-numerical-precision
Open

fix(tests): dynamic tolerance for cuDNN TF32 precision on Ampere+ GPUs#771
Zhaoxian-Wu wants to merge 1 commit into
IBM:masterfrom
Zhaoxian-Wu:fix/test-numerical-precision

Conversation

@Zhaoxian-Wu

Copy link
Copy Markdown

Summary

This PR addresses issue #766, the CUDA test failures on Ampere+ GPUs caused by the precision mismatch between cuDNN's default TF32 Tensor Core path and the RPU backend's FP32 CUBLAS path.

Instead of globally disabling TF32, this follows the hardware-adaptive direction discussed in the issue:

  • add cached CUDA probes for Conv3d and RNN numerical divergence
  • derive assert_array_almost_equal decimal tolerances from the measured divergence
  • keep CPU tests at the existing strict decimal=6
  • relax CUDA tolerances only when the current GPU/cuDNN behavior requires it

Testing

The affected CUDA tests now pass on my setup, including the previously failing RNN/LSTM and Conv3d cases on TF32-capable GPUs.


Hi @PabloCarmona, this PR helps me pass the full test suite locally. If the tests on the Git CLI pass as well, I will update the code in my other PRs accordingly. Thanks for your effort in maintaining the library.

@Zhaoxian-Wu Zhaoxian-Wu force-pushed the fix/test-numerical-precision branch from c55b024 to 76a8e81 Compare June 8, 2026 23:34
@PabloCarmona

PabloCarmona commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Thanks @Zhaoxian-Wu for your work! Alongside this one, look at the others ones and sync them with master, so the actions triggers again and should be working now since I added the fix to some of the lint errors were arising in the past.

Also pass in the linting tool to address this:

tests/helpers/testcases.py:56:53: E261 at least two spaces before inline comment
tests/helpers/testcases.py:58:55: E261 at least two spaces before inline comment

@Zhaoxian-Wu Zhaoxian-Wu force-pushed the fix/test-numerical-precision branch from 76a8e81 to 0078e37 Compare June 10, 2026 18:18
@Zhaoxian-Wu

Copy link
Copy Markdown
Author

Got it. I forgot to test the pycodestyle. This commit should be okay. Could you please trigger the test again to see whether it's correct? @PabloCarmona

@PabloCarmona PabloCarmona left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok for me, please update rest of PRs with the tests accordantly and we can start to finally merge them. I supposed this doesn't need to be merged once the other PRs has its test cases updated.

cuDNN defaults to TF32 Tensor Cores on Ampere+ GPUs (sm>=80), causing
~1e-3 divergence vs the RPU backend's FP32 CUBLAS path. The existing
hard-coded tolerances (decimal=4/6) fail on H100 (RNN) and Blackwell
(RNN + Conv3d).

Add hardware-adaptive probes that measure the actual cuDNN-vs-non-cuDNN
divergence at test session start, then derive tolerances from the
measured value. CPU tests remain at decimal=6; CUDA tests relax only as
much as the current GPU requires.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhaoxian Wu <wuzhaoxian97@gmail.com>
@Zhaoxian-Wu Zhaoxian-Wu force-pushed the fix/test-numerical-precision branch from 0078e37 to dc10cef Compare June 20, 2026 15:32
@Zhaoxian-Wu

Copy link
Copy Markdown
Author

Sure, I've updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants