Skip to content

[bug] fix for incorrect output from grouped Conv and regression test for Conv with batch>1 & group>1#24

Open
harz05 wants to merge 2 commits into
ML4EP:gpu/alpakafrom
harz05:fix/conv-grouped-gemm-n
Open

[bug] fix for incorrect output from grouped Conv and regression test for Conv with batch>1 & group>1#24
harz05 wants to merge 2 commits into
ML4EP:gpu/alpakafrom
harz05:fix/conv-grouped-gemm-n

Conversation

@harz05

@harz05 harz05 commented May 25, 2026

Copy link
Copy Markdown

Closes #23

In Generate_GPU_ALPAKA and GetBlasConfig of ROperator_Conv.hxx, gemm_n was being set to total output channels and used directly in per-group matmul calls and BLAS layout config without dividing by fAttrGroup for the grouped path. The comment mentions that "we divide per group at launch" but the division was missing and same was verified with test failure as reported by #22 ,this caused per-group matmuls to compute all output channels instead of just the group's slice.

Restoring the division at the variable assignment fixes it in one spot for both call sites.

EDIT: The PR nowadds a ConvGroupBatch GTest (group=2, batch=4) as the regression test for this path; it fails without this fix and passes with it.

@harz05

harz05 commented May 25, 2026

Copy link
Copy Markdown
Author

@sanjibansg a quick check on this. I noticed that in the recent operator init refactor, the gemm_n /= fAttrGroup division got dropped, was that intentional or did it just not carry over from the old Initialize() setup? I've added it back here in both Generate_GPU_ALPAKA and GetBlasConfig. Happy to change if you had a different intent for handling the grouped path.

@harz05 harz05 force-pushed the fix/conv-grouped-gemm-n branch from 17906b3 to d732b8d Compare June 11, 2026 10:47
@harz05 harz05 changed the title fix for incorrect output from grouped Conv [bug] fix for incorrect output from grouped Conv and regression test for Conv with batch>1 & group>1 Jun 11, 2026
@harz05

harz05 commented Jun 11, 2026

Copy link
Copy Markdown
Author

also added a ConvGroupBatch GTest that exercises group>1 and batch>1 together (group=2, batch=4), which was never covered. Verified on Colab T4 with the tests passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant