Optimize strided permutation fast paths by shinaoka · Pull Request #133 · tensor4all/strided-rs

shinaoka · 2026-07-01T22:48:01Z

Summary

add tiled transpose-scale fast paths for f64 and identity-copy scalar types, including zero-scale handling
add HPTT-style grouped high-rank permutation planning and parallel execution with a serial fallback when Rayon has one worker
add tile-level parallel execution for rootless 2D transpose plans, fixing the 3D [2,0,1] parallel scaling bottleneck
tighten strided-perm public API and move benchmark coverage out of the crate README into the benchmark suite
add repository rules adapted from tenferro-rs, including the nthreads=1 serial-kernel rule
link the top README to the benchmark suite and invite bottleneck/failure-case reports

Verification

cargo fmt --check && cargo test -p strided-perm --features parallel && cargo test
cargo llvm-cov --workspace --json --output-path coverage.json && python3 scripts/check-coverage.py coverage.json

Benchmarks

M5 MacBook Pro numbers are recorded in tensor4all/strided-rs-benchmark-suite on branch scale-transpose-bench. The 3D 256^3 [2,0,1] parallel path improved from about 15.8 ms to 4.792 ms on 4 threads.

shinaoka mentioned this pull request Jul 1, 2026

Add strided permutation benchmark coverage tensor4all/strided-rs-benchmark-suite#29

Merged

shinaoka force-pushed the scale-transpose-fast-path branch from 5cd0287 to cba782d Compare July 1, 2026 23:13

shinaoka marked this pull request as ready for review July 1, 2026 23:14

Optimize strided permutation fast paths

91a0aca

shinaoka force-pushed the scale-transpose-fast-path branch from cba782d to 91a0aca Compare July 1, 2026 23:17

shinaoka merged commit 7cdc813 into main Jul 1, 2026
5 checks passed

shinaoka deleted the scale-transpose-fast-path branch July 1, 2026 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize strided permutation fast paths#133

Optimize strided permutation fast paths#133
shinaoka merged 1 commit into
mainfrom
scale-transpose-fast-path

shinaoka commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shinaoka commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Benchmarks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shinaoka commented Jul 1, 2026 •

edited

Loading