SOFIE generates incorrect Conv code whenever dilation > 1, on both the alpaka GPU path and the CPU path. This is the same bug fixed upstream in ROOT by root-project/root#22474 (issue root-project/root#22473); the fix has not been ported here, and the GPU path needs an additional fix of its own.
cause
In Initialize, fAttrKernelShape is overwritten with the dilation-expanded kernel size k + (dilation - 1)*(k - 1) (a 3x3 kernel with dilation 2 becomes 5x5). That expanded value is then used again together with fAttrDilations, so dilation is counted twice. With dilation 1 the expansion k + 0 = k is a no-op, which is why no existing test catches it: every Conv model in the suite uses dilation 1.
GPU path (Generate_GPU_Kernel_ALPAKA): silently wrong output
The weight-vectorisation kernel bakes the gap into the _f layout using the expanded kernel, while the im2col kernel re-applies the same gap. For a 3x3 dilation-2 model the generated kernels contain, in the same file:
// weight-vec: gap folded into _f (kernel treated as effective width 5, hstride = 2*5 = 10)
std::size_t const f_idx = ... kh * 10u + kw * 2u;
f[f_idx] = W[elem_idx];
// im2col: rows decoded with the RAW width 3 over a 25-slot block, dilation applied AGAIN
std::size_t const kh = k_rem / 3u;
int64_t const ih_in = static_cast<int64_t>(oh * 1u + kh * 2u) - 0;
int64_t const iw_in = static_cast<int64_t>(ow * 1u + kw * 2u) - 0;
The weight buffer is spaced by 10 (effective-5 grid) but the column buffer is decoded with width 3 and multiplied by dilation a second time, so the two no longer correspond. The matrix dimensions stay consistent, so there is no crash, just a wrong result.
CPU path (Generate): segfault
Emits Im2col(..., fAttrKernelShape[0], fAttrKernelShape[1], ..., fAttrDilations[0], fAttrDilations[1], ...), i.e. the expanded kernel (5,5) together with dilation (2,2). Im2col already applies dilation internally, so it computes a negative output dimension and writes past the _xcol buffer, segfaulting. This is exactly root-project/root#22473; the one-line fix from root-project/root#22474 (fAttrDilations = {1,1,1} after the weight reorder) is missing here.
Reproducer
A 3x3 dilation-2 Conv, input 1x1x7x7, no padding, unit stride (the same model as the ROOT reproducer in root-project/root#22473). Generated with GenerateGPU_ALPAKA the output does not match the ONNX/PyTorch reference; the CPU Generate segfaults.
Expected
Generated Conv matches the ONNX reference for any valid dilation, the same way it already does for padding and strides.
SOFIE generates incorrect Conv code whenever dilation > 1, on both the alpaka GPU path and the CPU path. This is the same bug fixed upstream in ROOT by root-project/root#22474 (issue root-project/root#22473); the fix has not been ported here, and the GPU path needs an additional fix of its own.
cause
In Initialize,
fAttrKernelShapeis overwritten with the dilation-expanded kernel sizek + (dilation - 1)*(k - 1)(a 3x3 kernel with dilation 2 becomes 5x5). That expanded value is then used again together withfAttrDilations, so dilation is counted twice. With dilation 1 the expansionk + 0 = kis a no-op, which is why no existing test catches it: every Conv model in the suite uses dilation 1.GPU path (Generate_GPU_Kernel_ALPAKA): silently wrong output
The weight-vectorisation kernel bakes the gap into the _f layout using the expanded kernel, while the im2col kernel re-applies the same gap. For a 3x3 dilation-2 model the generated kernels contain, in the same file:
The weight buffer is spaced by 10 (effective-5 grid) but the column buffer is decoded with width 3 and multiplied by dilation a second time, so the two no longer correspond. The matrix dimensions stay consistent, so there is no crash, just a wrong result.
CPU path (Generate): segfault
Emits
Im2col(..., fAttrKernelShape[0], fAttrKernelShape[1], ..., fAttrDilations[0], fAttrDilations[1], ...), i.e. the expanded kernel (5,5) together with dilation (2,2). Im2col already applies dilation internally, so it computes a negative output dimension and writes past the _xcol buffer, segfaulting. This is exactly root-project/root#22473; the one-line fix from root-project/root#22474 (fAttrDilations = {1,1,1}after the weight reorder) is missing here.Reproducer
A 3x3 dilation-2 Conv, input 1x1x7x7, no padding, unit stride (the same model as the ROOT reproducer in root-project/root#22473). Generated with GenerateGPU_ALPAKA the output does not match the ONNX/PyTorch reference; the CPU Generate segfaults.
Expected
Generated Conv matches the ONNX reference for any valid dilation, the same way it already does for padding and strides.