arena: handle mismatched null inner cells in trivial binary ToT ops#556
Merged
Conversation
arena_trivial_binary sized the result by the left operand only and read the right operand unconditionally. A cell present in left but null in right then read a null slab (segfault), and a cell present in right but null in left was silently dropped. Two ToT arrays with the same outer shape can differ in which inner cells are populated within an outer tile (e.g. occ_tile_size>1 aggregates several pairs, some screened to null), which exercises exactly this. Size the result by the union of left/right cell presence and combine a lone cell against an implicit zero slab: correct for the linear ops (add: l+0 / 0+r; subt: l-0 / 0-r) and numerically correct for mult (l*0 = 0). The same kernel backs both Tensor<ArenaTensor> and Tensor<Tensor<double>>, so both inner types are fixed. Add regression tests (lone-left, both, both-null, lone-right cells) for add/subt/mult on both inner types.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
arena_trivial_binarysized the result by the left operand only and read the right unconditionally, so a ToT cell present in left but null in right read a null slab (segfault) and a cell present in right but null in left was silently dropped. Two ToT arrays with the same outer shape can differ in which inner cells are populated within an outer tile (e.g.occ_tile_size>1in CSV-CC aggregates several pairs, some screened to null), which exercises exactly this — it surfaced as a segfault injacobi_update/Sum-accumulation during PNO-CCSD with clustered occupied tiles.Fix: size the result by the union of left/right cell presence and combine a lone cell against an implicit zero slab — correct for the linear ops (add:
l+0/0+r; subt:l-0/0-r) and numerically correct for mult (l*0=0). The same kernel backs bothTensor<ArenaTensor>andTensor<Tensor<double>>, so both inner types are fixed.Tests: regression coverage for lone-left / both / both-null / lone-right inner cells across add/subt/mult, for both
Tensor<ArenaTensor>(arena_tensor_kernels.cpp) andTensor<Tensor<double>>(arena_tot_trivial.cpp). Verified out-of-band: fixed kernel passes; reverting the fix reproduces the segfault (exit 139) on the lone-left case.Audit note: all other arena ToT ops (unary/scaled/grow/add_to/permute kernels, gemm, einsum, the in-place
_toops) already null-guard; this was the lone gap.