added GTests for Conv GPU groups>1 inference by harz05 · Pull Request #22 · ML4EP/SOFIE

harz05 · 2026-05-25T11:06:01Z

Closes #21

This PR adds ONNX models with groups 2 and 4 along with reference outputs and TEST_F cases in the TestCustomModelsFromONNXForAlpakaCuda.cxx. A combined batch=4 + groups=2 model is also added to validate that the outer batch loop and inner group loop nest correctly.

Reference outputs are verified against PyTorch F.conv2d on Google Colab (NVIDIA T4, CUDA 12.x).

Choice of weights

Weights are iota-filled (1, 2, 3, ...) rather than all-ones so that each group's filter slice has distinguishable values; distinct values per slice make any per-group indexing bug visible. Inputs use std::iota for the same reason.

Regression in current gpu/alpaka

The tests are currently failing. Reason being that the grouped path in Generate_GPU_ALPAKA of ROperator_Conv.hxx seems to have been refactored recently, the gemm_n now has been declared locally in the functionn, but the per-group division of gemm_n that was previously done in Initialize() didn't get carried along. The comment "we divide per group at launch" is still there, but no division actually happens at any launch site.

The effect: gemm_n stays at total output channels rather than per-group. For groups=2 with outC=4, each per-group matmul computes all 4 output channels instead of 2, using group 0's input throughout. Empirically, output indices i=0..49 are correct while i=50..99 are wrong.

Steps to reproduce on Colab:

!apt-get install -y libprotobuf-dev protobuf-compiler libgtest-dev

!git clone https://github.com/harz05/SOFIE.git
%cd /content/SOFIE
!git checkout feat/conv-group-tests

!mkdir build && cd build && cmake -Dtesting=ON -DCMAKE_INSTALL_PREFIX=../install -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_ALPAKA_TESTS=ON -DALPAKA_BACKEND=cuda .. && cmake --build . --target install -j$(nproc)

!cd /content/SOFIE/build && ctest -V 2>&1 | grep -E "ConvGroup|ConvBatch|PASSED|FAILED|tests passed"

added GTests for Conv GPU groups>1 inference

614e51f

This was referenced May 25, 2026

Incorrect output from grouped Conv GPU path #23

Open

[bug] fix for incorrect output from grouped Conv and regression test for Conv with batch>1 & group>1 #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added GTests for Conv GPU groups>1 inference#22

added GTests for Conv GPU groups>1 inference#22
harz05 wants to merge 1 commit into
ML4EP:gpu/alpakafrom
harz05:feat/conv-group-tests

harz05 commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harz05 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choice of weights

Regression in current gpu/alpaka

Steps to reproduce on Colab:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harz05 commented May 25, 2026 •

edited

Loading