Skip to content

Conv generates wrong code for dilation > 1 (GPU silent-wrong, CPU segfault) #32

@harz05

Description

@harz05

SOFIE generates incorrect Conv code whenever dilation > 1, on both the alpaka GPU path and the CPU path. This is the same bug fixed upstream in ROOT by root-project/root#22474 (issue root-project/root#22473); the fix has not been ported here, and the GPU path needs an additional fix of its own.

cause

In Initialize, fAttrKernelShape is overwritten with the dilation-expanded kernel size k + (dilation - 1)*(k - 1) (a 3x3 kernel with dilation 2 becomes 5x5). That expanded value is then used again together with fAttrDilations, so dilation is counted twice. With dilation 1 the expansion k + 0 = k is a no-op, which is why no existing test catches it: every Conv model in the suite uses dilation 1.

GPU path (Generate_GPU_Kernel_ALPAKA): silently wrong output

The weight-vectorisation kernel bakes the gap into the _f layout using the expanded kernel, while the im2col kernel re-applies the same gap. For a 3x3 dilation-2 model the generated kernels contain, in the same file:

    // weight-vec: gap folded into _f (kernel treated as effective width 5, hstride = 2*5 = 10)
    std::size_t const f_idx = ... kh * 10u + kw * 2u;
    f[f_idx] = W[elem_idx];

    // im2col: rows decoded with the RAW width 3 over a 25-slot block, dilation applied AGAIN
    std::size_t const kh = k_rem / 3u;
    int64_t const ih_in = static_cast<int64_t>(oh * 1u + kh * 2u) - 0;
    int64_t const iw_in = static_cast<int64_t>(ow * 1u + kw * 2u) - 0;

The weight buffer is spaced by 10 (effective-5 grid) but the column buffer is decoded with width 3 and multiplied by dilation a second time, so the two no longer correspond. The matrix dimensions stay consistent, so there is no crash, just a wrong result.

CPU path (Generate): segfault

Emits Im2col(..., fAttrKernelShape[0], fAttrKernelShape[1], ..., fAttrDilations[0], fAttrDilations[1], ...), i.e. the expanded kernel (5,5) together with dilation (2,2). Im2col already applies dilation internally, so it computes a negative output dimension and writes past the _xcol buffer, segfaulting. This is exactly root-project/root#22473; the one-line fix from root-project/root#22474 (fAttrDilations = {1,1,1} after the weight reorder) is missing here.

Reproducer

A 3x3 dilation-2 Conv, input 1x1x7x7, no padding, unit stride (the same model as the ROOT reproducer in root-project/root#22473). Generated with GenerateGPU_ALPAKA the output does not match the ONNX/PyTorch reference; the CPU Generate segfaults.

Expected

Generated Conv matches the ONNX reference for any valid dilation, the same way it already does for padding and strides.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions