Skip to content

arena: handle mismatched null inner cells in trivial binary ToT ops#556

Merged
evaleev merged 1 commit into
masterfrom
evaleev/fix/arena-tot-binary-null-inner
May 25, 2026
Merged

arena: handle mismatched null inner cells in trivial binary ToT ops#556
evaleev merged 1 commit into
masterfrom
evaleev/fix/arena-tot-binary-null-inner

Conversation

@evaleev
Copy link
Copy Markdown
Member

@evaleev evaleev commented May 25, 2026

arena_trivial_binary sized the result by the left operand only and read the right unconditionally, so a ToT cell present in left but null in right read a null slab (segfault) and a cell present in right but null in left was silently dropped. Two ToT arrays with the same outer shape can differ in which inner cells are populated within an outer tile (e.g. occ_tile_size>1 in CSV-CC aggregates several pairs, some screened to null), which exercises exactly this — it surfaced as a segfault in jacobi_update/Sum-accumulation during PNO-CCSD with clustered occupied tiles.

Fix: size the result by the union of left/right cell presence and combine a lone cell against an implicit zero slab — correct for the linear ops (add: l+0/0+r; subt: l-0/0-r) and numerically correct for mult (l*0=0). The same kernel backs both Tensor<ArenaTensor> and Tensor<Tensor<double>>, so both inner types are fixed.

Tests: regression coverage for lone-left / both / both-null / lone-right inner cells across add/subt/mult, for both Tensor<ArenaTensor> (arena_tensor_kernels.cpp) and Tensor<Tensor<double>> (arena_tot_trivial.cpp). Verified out-of-band: fixed kernel passes; reverting the fix reproduces the segfault (exit 139) on the lone-left case.

Audit note: all other arena ToT ops (unary/scaled/grow/add_to/permute kernels, gemm, einsum, the in-place _to ops) already null-guard; this was the lone gap.

arena_trivial_binary sized the result by the left operand only and read
the right operand unconditionally. A cell present in left but null in
right then read a null slab (segfault), and a cell present in right but
null in left was silently dropped. Two ToT arrays with the same outer
shape can differ in which inner cells are populated within an outer tile
(e.g. occ_tile_size>1 aggregates several pairs, some screened to null),
which exercises exactly this.

Size the result by the union of left/right cell presence and combine a
lone cell against an implicit zero slab: correct for the linear ops
(add: l+0 / 0+r; subt: l-0 / 0-r) and numerically correct for mult
(l*0 = 0). The same kernel backs both Tensor<ArenaTensor> and
Tensor<Tensor<double>>, so both inner types are fixed.

Add regression tests (lone-left, both, both-null, lone-right cells) for
add/subt/mult on both inner types.
@evaleev evaleev merged commit c70fa07 into master May 25, 2026
9 checks passed
@evaleev evaleev deleted the evaleev/fix/arena-tot-binary-null-inner branch May 25, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant