Skip to content

Add backend-neutral raw strided bgemm API#134

Merged
shinaoka merged 4 commits into
tensor4all:mainfrom
Ryo-wtnb11:prepared-raw-bgemm
Jul 2, 2026
Merged

Add backend-neutral raw strided bgemm API#134
shinaoka merged 4 commits into
tensor4all:mainfrom
Ryo-wtnb11:prepared-raw-bgemm

Conversation

@Ryo-wtnb11

@Ryo-wtnb11 Ryo-wtnb11 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds a backend-neutral raw borrowed-layout entry point for strided
batched GEMM.

The normal StridedView / StridedViewMut APIs remain unchanged. The new API
is for prepared replay paths that already have validated
dims / strides / offset descriptors and should not rebuild owning
view metadata for every small GEMM.

Motivation

Tensor contraction engines often split execution into two phases:

  1. Build or compile a structural plan once.
  2. Replay the same strided subblock layouts many times.

In the replay phase, the caller already has borrowed layout metadata:

let dims: &[usize] = &[m, k];
let strides: &[isize] = &[k as isize, 1];
let offset: isize = 0;

Constructing a new StridedView for every replay allocates owned dynamic-rank
metadata. That is fine for the general view API, but it is avoidable overhead
for compiled replay. This PR adds a raw borrowed-layout API for that case.

API shape

Raw layout types live in strided-view, not in a concrete backend module:

use strided_view::{RawStridedMut, RawStridedRef};

The backend-neutral GEMM entry point is re-exported from strided-einsum2:

use strided_einsum2::{bgemm_raw_strided_into, RawStridedMut, RawStridedRef};

let a = RawStridedRef::new(a_data, &[m, k], &[k as isize, 1], 0)?;
let b = RawStridedRef::new(b_data, &[k, n], &[n as isize, 1], 0)?;
let c = RawStridedMut::new(c_data, &[m, n], &[n as isize, 1], 0)?;

bgemm_raw_strided_into(
    c,
    a,
    b,
    0, // n_batch
    1, // n_lo
    1, // n_ro
    1, // n_sum
    1.0,
    0.0,
    false, // conj_a
    false, // conj_b
)?;

For compiled plans that have already validated bounds and ranks:

let a = unsafe {
    RawStridedRef::new_unchecked(a_data, prepared_a_dims, prepared_a_strides, prepared_a_offset)
};
let b = unsafe {
    RawStridedRef::new_unchecked(b_data, prepared_b_dims, prepared_b_strides, prepared_b_offset)
};
let c = unsafe {
    RawStridedMut::new_unchecked(c_data, prepared_c_dims, prepared_c_strides, prepared_c_offset)
};

unsafe {
    bgemm_raw_strided_into_unchecked(
        c, a, b, n_batch, n_lo, n_ro, n_sum, alpha, beta, conj_a, conj_b,
    )?;
}

The unchecked path requires the caller to prove:

  • every reachable element is inside the backing slice,
  • the rank partitions match [lo, sum, batch], [sum, ro, batch], and
    [lo, ro, batch],
  • matching dimension groups have identical extents,
  • C does not alias A or B in a way that violates mutable access.

Backend behavior

The raw API is not faer-specific. It lowers through the same backend-neutral
prepare path:

RawStridedRef/Mut
  -> prepare_input_raw / prepare_output_raw
  -> Backend::bgemm_contiguous_into

That means faer, BLAS, and future backends can share the same raw metadata
boundary. The concrete backend difference stays at the final GEMM call.

Compatibility

Existing bgemm_strided_into callers continue to use StridedView /
StridedViewMut. The faer module keeps its compatibility wrapper, but external
callers should use the backend-neutral strided_einsum2::bgemm_raw_strided_into
API for prepared replay.

Tests

Validated:

cargo test -p strided-einsum2 --no-default-features --quiet
cargo test -p strided-view -p strided-einsum2 --features faer --quiet

Coverage includes:

  • raw GEMM through the active backend for f64
  • raw GEMM through the active backend for f32
  • Complex64 raw GEMM with conjugation
  • explicit backend dispatch
  • checked shape/rank mismatch reporting
  • zero-size contraction with beta == 0 and beta != 0
  • non-contiguous output writeback
  • existing view-based compatibility behavior

@Ryo-wtnb11 Ryo-wtnb11 changed the title Add raw strided faer bgemm API Add backend-neutral raw strided bgemm API Jul 2, 2026
@shinaoka shinaoka merged commit 4ea9acb into tensor4all:main Jul 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants