Skip to content

added GTests for Conv GPU groups>1 inference#22

Open
harz05 wants to merge 1 commit into
ML4EP:gpu/alpakafrom
harz05:feat/conv-group-tests
Open

added GTests for Conv GPU groups>1 inference#22
harz05 wants to merge 1 commit into
ML4EP:gpu/alpakafrom
harz05:feat/conv-group-tests

Conversation

@harz05

@harz05 harz05 commented May 25, 2026

Copy link
Copy Markdown

Closes #21

This PR adds ONNX models with groups 2 and 4 along with reference outputs and TEST_F cases in the TestCustomModelsFromONNXForAlpakaCuda.cxx. A combined batch=4 + groups=2 model is also added to validate that the outer batch loop and inner group loop nest correctly.

Reference outputs are verified against PyTorch F.conv2d on Google Colab (NVIDIA T4, CUDA 12.x).

Choice of weights

Weights are iota-filled (1, 2, 3, ...) rather than all-ones so that each group's filter slice has distinguishable values; distinct values per slice make any per-group indexing bug visible. Inputs use std::iota for the same reason.

Regression in current gpu/alpaka

The tests are currently failing. Reason being that the grouped path in Generate_GPU_ALPAKA of ROperator_Conv.hxx seems to have been refactored recently, the gemm_n now has been declared locally in the functionn, but the per-group division of gemm_n that was previously done in Initialize() didn't get carried along. The comment "we divide per group at launch" is still there, but no division actually happens at any launch site.

The effect: gemm_n stays at total output channels rather than per-group. For groups=2 with outC=4, each per-group matmul computes all 4 output channels instead of 2, using group 0's input throughout. Empirically, output indices i=0..49 are correct while i=50..99 are wrong.

Steps to reproduce on Colab:

!apt-get install -y libprotobuf-dev protobuf-compiler libgtest-dev

!git clone https://github.com/harz05/SOFIE.git
%cd /content/SOFIE
!git checkout feat/conv-group-tests

!mkdir build && cd build && cmake -Dtesting=ON -DCMAKE_INSTALL_PREFIX=../install -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_ALPAKA_TESTS=ON -DALPAKA_BACKEND=cuda .. && cmake --build . --target install -j$(nproc)

!cd /content/SOFIE/build && ctest -V 2>&1 | grep -E "ConvGroup|ConvBatch|PASSED|FAILED|tests passed"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant