Skip to content

Expand strided einsum CPU kernels#130

Merged
shinaoka merged 3 commits into
mainfrom
codex/batched-outer-product-kernel
Jun 6, 2026
Merged

Expand strided einsum CPU kernels#130
shinaoka merged 3 commits into
mainfrom
codex/batched-outer-product-kernel

Conversation

@shinaoka

@shinaoka shinaoka commented Jun 6, 2026

Copy link
Copy Markdown
Member

Summary

  • move broadcast multiply / batched outer product / non-conjugated dot-general execution into reusable strided-rs kernels
  • add strided-einsum2::DotGeneralConfig with borrowed axis slices and BLAS provider feature aliases
  • add PyTorch comparison benches and kernel-writing notes for future SIMD/kernel work

Why

tenferro-rs now delegates CPU binary einsum kernels to strided-rs, so these paths need to live behind general stride/layout APIs rather than benchmark-specific tenferro shortcuts.

Notes

  • The implementation is stride/layout driven; it does not branch on benchmark names or fixed tensor shapes.
  • docs/superpowers/... local planning notes were intentionally not committed.

Verification

  • cargo fmt --check
  • cargo test -p strided-kernel --features parallel
  • cargo test -p strided-einsum2 --features parallel,blas-accelerate --no-default-features
  • git diff --check

@shinaoka shinaoka merged commit 6585675 into main Jun 6, 2026
5 checks passed
@shinaoka shinaoka deleted the codex/batched-outer-product-kernel branch June 6, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant