Skip to content

Fix dilation>1 in Conv operator (GPU & CPU)#34

Open
harz05 wants to merge 3 commits into
ML4EP:gpu/alpakafrom
harz05:fix/conv-dilation
Open

Fix dilation>1 in Conv operator (GPU & CPU)#34
harz05 wants to merge 3 commits into
ML4EP:gpu/alpakafrom
harz05:fix/conv-dilation

Conversation

@harz05

@harz05 harz05 commented Jun 9, 2026

Copy link
Copy Markdown

Implements #32 .

Conv generated wrong code for dilation > 1 because dilation was counted twice: fAttrKernelShape is expanded to the dilated size in Initialize and then used again with fAttrDilations. Same is already covered and fix as=> root-project/root#22473 / root-project/root#22474.

CPU path: ports the root-project/root#22474 one-liner, resetting fAttrDilations to 1 after the weight reorder so the dense im2col does not re-apply dilation.

GPU path (alpaka): applies the analogous fix in the generated kernels:

  • the im2col kernel decodes its rows over the effective (expanded) kernel extents instead of the raw width, so the columns line up with the dilated _f layout
  • the im2col gather no longer re-applies dilation
  • _f is zeroed before the weight-vec kernel when dilation > 1 (alpaka buffers are not zero-initialised), so the gaps in the dilated layout stay 0

Generated im2col for a 3x3 dilation-2 conv, before -> after:

decode k_rem / 3u is changed to k_rem / 5u
gather oh*1u + kh*2u is changed to oh*1u + kh
zero (none) is changed to alpaka::memset(queue, deviceBuf_x_f, 0)

The changes are no-ops at dilation 1, so existing Conv tests are unaffected.

Tested on Colab T4: ConvWithDilation passes and all existing Conv tests still pass. Reverting only the codegen fix makes ConvWithDilation fail, confirming the fix.

image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant