Skip to content

Optimize strided permutation fast paths#133

Merged
shinaoka merged 1 commit into
mainfrom
scale-transpose-fast-path
Jul 1, 2026
Merged

Optimize strided permutation fast paths#133
shinaoka merged 1 commit into
mainfrom
scale-transpose-fast-path

Conversation

@shinaoka

@shinaoka shinaoka commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

  • add tiled transpose-scale fast paths for f64 and identity-copy scalar types, including zero-scale handling
  • add HPTT-style grouped high-rank permutation planning and parallel execution with a serial fallback when Rayon has one worker
  • add tile-level parallel execution for rootless 2D transpose plans, fixing the 3D [2,0,1] parallel scaling bottleneck
  • tighten strided-perm public API and move benchmark coverage out of the crate README into the benchmark suite
  • add repository rules adapted from tenferro-rs, including the nthreads=1 serial-kernel rule
  • link the top README to the benchmark suite and invite bottleneck/failure-case reports

Verification

  • cargo fmt --check && cargo test -p strided-perm --features parallel && cargo test
  • cargo llvm-cov --workspace --json --output-path coverage.json && python3 scripts/check-coverage.py coverage.json

Benchmarks

M5 MacBook Pro numbers are recorded in tensor4all/strided-rs-benchmark-suite on branch scale-transpose-bench. The 3D 256^3 [2,0,1] parallel path improved from about 15.8 ms to 4.792 ms on 4 threads.

@shinaoka shinaoka force-pushed the scale-transpose-fast-path branch from 5cd0287 to cba782d Compare July 1, 2026 23:13
@shinaoka shinaoka marked this pull request as ready for review July 1, 2026 23:14
@shinaoka shinaoka force-pushed the scale-transpose-fast-path branch from cba782d to 91a0aca Compare July 1, 2026 23:17
@shinaoka shinaoka merged commit 7cdc813 into main Jul 1, 2026
5 checks passed
@shinaoka shinaoka deleted the scale-transpose-fast-path branch July 1, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant