GB200 support: SendRecv DSL collective and per-channel executor connections by Binyang2014 · Pull Request #810 · microsoft/mscclpp

Binyang2014 · 2026-05-20T23:58:42Z

Summary

GB200 support work: introduces point-to-point send/receive in the MSCCL++ DSL
and extends the executor for split-NVL-domain topologies where some ranks are
NVL-connected within a node and other ranks must communicate across the network.

DSL

New SendRecv collective with separate input/output buffers
(python/mscclpp/language/collectives.py).
New multi-node sendrecv DSL example
(python/mscclpp/language/tests/multi_node/send_recv.py) with --split_mask
(group size − 1) and --instances CLI options. Documents the channel-ordering
trick that keeps signal tags cross-matched between paired peers when
prev == next.
BaseBuffer.__getitem__ now accepts slices with None start/stop
(e.g., buf[:]).

Executor

One connection (unique QP) per channel entry instead of one per peer.
Required for HostNoAtomic IB mode where each QP can forward signals to a
single semaphore. Uses per-peer tag counters so paired ranks agree on tag
ordering regardless of the order peers appear in each rank's connected_to
list.
MEMORY channels now unconditionally use Transport::CudaIpc; only PORT
channels can use IB. Matches the invariant already enforced by
getTransportFlags.
ExecutionContext::connections is now a vector<Connection> indexed by
channel order (was unordered_map<int, Connection> keyed by peer). Removes
redundant semaphore fields from ExecutionContext.
TODO: explicit NVL-domain check in useIB

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Modified test/mp_unit/mp_unit_tests.hpp to use ../framework.hpp instead of gtest/gtest.h - Enhanced test/framework.hpp with GTest-compatible APIs: - Added Environment base class for global test setup/teardown - Added TestInfo and UnitTest classes for test metadata access - Added GTEST_SKIP macro support via SkipHelper class - Added namespace alias 'testing' for compatibility - Added InitGoogleTest and AddGlobalTestEnvironment helper functions - Updated test/framework.cc with implementations for new classes - All mp_unit test files now use framework.hpp through mp_unit_tests.hpp - Formatting applied via lint.sh Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Remove duplicate static getMPIRank() and getMPISize() functions - Add full namespace qualification to GTEST_SKIP macro Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Move test framework from test/perf/ to test/ for shared use - Add GTest-compatible macros (TEST, TEST_F, EXPECT_*, ASSERT_*, etc.) - Remove GTest dependency from CMakeLists.txt - Add test_framework library for unit and mp_unit tests - Add code coverage support with lcov (MSCCLPP_ENABLE_COVERAGE option) - Update perf tests to use shared framework Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Support --gtest_filter command line argument for test filtering, compatible with Azure pipeline configurations. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Move PerfTestResult struct definition outside vector declaration - Move getCurrentTimestamp to anonymous namespace - Add documentation for GTEST_SKIP macro explaining RAII pattern Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

The recent removal of GTest and introduction of custom test framework requires MPI dependency which is not needed for CodeQL analysis. Disable test building in CodeQL workflows to fix the build failures. CodeQL only needs to analyze the core library code, not the tests. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Make MPI REQUIRED for test builds (clearer error messages) - Add project include directories to test_framework library - Fix core_tests.cc to use custom framework correctly - Fix mp_unit_tests.hpp to use mscclpp::test namespace - Add FAIL() macro with streaming support for test messages - Building tests now works in Docker environment with GPU bypass Tests can now be built using: docker run --rm -v $(pwd):/workspace -w /workspace \ ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \ "mkdir build && cd build && cmake -DMSCCLPP_BYPASS_GPU_CHECK=ON \ -DMSCCLPP_USE_CUDA=ON .. && make -j" Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Move helper classes inside namespace for proper access - Remove duplicate class definitions outside namespace - Tests can now build in Docker with CUDA toolkit installed - Remaining issues: ErrorCode and TransportFlags need operator<< for EXPECT_EQ Successfully building with Docker: docker run --rm -v $(pwd):/workspace -w /workspace \ ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \ "mkdir build && cd build && cmake -DMSCCLPP_BYPASS_GPU_CHECK=ON \ -DMSCCLPP_USE_CUDA=ON .. && make -j4" Note: Some unit tests (errors_tests.cc, core_tests.cc) need operator<< defined for ErrorCode and TransportFlags to compile with custom framework. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Remove build_test/ directory containing CMake cache and build files - Update .gitignore to exclude build_*/ pattern to prevent future accidents These CMake artifacts (CMakeCache.txt, CMakeFiles/, generated headers) were accidentally committed and should never be in version control. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Add nlohmann::ordered_json metrics field to TestResult struct - Add nlohmann/json.hpp include to test/framework.hpp - Link test_framework with nlohmann_json::nlohmann_json - Replace PerfTestResult with TestResult in test/perf/framework.cc - Move perf utility functions to utils namespace for consistency - Remove duplicate PerfTestResult struct definition This consolidates the two similar structs into one, reducing code duplication while maintaining all necessary fields for both unit tests (passed/failure_message) and performance tests (metrics). Verified build succeeds with Docker: docker run --rm -v $(pwd):/workspace -w /workspace \ ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \ "cd /workspace/build && make -j4 fifo_test" Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Add isPerfTest field to TestInfoInternal struct - Add --exclude-perf-tests command line argument - Add PERF_TEST and PERF_TEST_F macros for marking performance tests - Update runAllTests to filter performance tests when requested - Remove genhtml dependency and HTML report generation - Keep only coverage.info file generation with lcov Performance tests can now be excluded with: ./build/bin/unit_tests --exclude-perf-tests ./build/bin/mp_unit_tests --exclude-perf-tests Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Add unit_tests_main.cc with main() function for unit_tests executable - Create fifo_perf_tests.cu as PERF_TEST for unit_tests - Add fifo_perf_tests.cu to unit_tests sources - Fix errors_tests.cc to use ASSERT_TRUE for ErrorCode comparisons - Fix core_tests.cc to use ASSERT_TRUE for TransportFlags comparisons - Add Azure pipeline step for Debug build with coverage - Add step to run mp_unit_tests --exclude-perf-tests with coverage The perf tests are now part of unit_tests and can be filtered out for coverage reporting. CI now includes Debug build with coverage collection for non-performance tests. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Another set of accidentally committed build artifacts in build2/ directory. The .gitignore pattern build_*/ should prevent these in the future. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Remove test/perf/ directory (fifo_test.cu, framework.{cc,hpp}, CMakeLists.txt) - Remove add_subdirectory(perf) from test/CMakeLists.txt - Performance tests now integrated into unit_tests as fifo_perf_tests.cu - Fix mp_unit_tests.cc to use framework functions without ::testing:: namespace - Fix bootstrap_tests.cc ErrorCode comparison to use ASSERT_TRUE - Fix switch_channel_tests.cu to not use streaming with ASSERT_EQ - Add missing #include <unistd.h> to executor_tests.cc All perf test functionality is now in unit_tests and can be filtered with --exclude-perf-tests flag. The standalone test/perf/ directory is no longer needed. Verified builds: - unit_tests: ✅ - mp_unit_tests: ✅ Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

1. Add missing includes to fifo_perf_tests.cu - Add #include <cassert> - Add #include <unordered_map> 2. Fix license header capitalization (4 files) - "license." → "License." in test/framework.{hpp,cc} - "license." → "License." in test/unit/{unit_tests_main.cc,fifo_perf_tests.cu} 3. Fix double MPI_Init issue - Check MPI_Initialized() before calling MPI_Init - Prevents double initialization when mp_unit_tests already inits MPI 4. Fix coverage flags for CUDA compilation - Use generator expressions to apply --coverage only to C++ language - Prevents breaking CUDA compilation with host-only flags 5. Fix environment memory leak - Delete environment objects after TearDown() - Clear environments_ vector 6. Implement proper GTEST_SKIP handling - Create SkipException class - Handle skipped tests separately from failures - Report skipped test count 7. Implement GTest-style filter pattern matching - Support wildcards (* and ?) - Support negative patterns (-Pattern) - Support colon-separated patterns (Foo:Bar) - Compatible with existing CI usage like --gtest_filter=-*Ib* Verified builds successfully with Docker. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Remove complex wildcard pattern matching (*, ?, negative patterns) - Use simple substring matching with find() - Simpler implementation, easier to understand and maintain - Still supports --gtest_filter for basic test name filtering Note: For advanced filtering like wildcards, users can use multiple test runs with different substring filters. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

- Add TEST_DATA_SEND_RECV verifier kernel that replays fill_data PRNG with peer_rank seed to validate received data - Add double-buffer support for sendrecv in executor_test.py: allocate 2 input/result/test buffers, alternate per iteration - Create two executor funcs for sendrecv, one per buffer pair - Update bench_correctness and bench_time to handle double-buffer - Add bandwidth reporting to output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds/extends support for a multi-node “sendrecv” communication pattern and adjusts executor connection construction to create distinct connections per channel entry, plus updates the Python executor test harness and verifier kernels to support split-group verification and double-buffering.

Changes:

Build one Connection (QP) per channel entry with deterministic per-peer tag assignment in the executor.
Extend Python executor tests/verifier kernels to support sendrecv, optional split_mask grouping, and double-buffer execution.
Add a new DSL SendRecv collective and a corresponding multi-node test program.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/core/include/execution_kernel.hpp	Fixes `putWithSignalAndFlush` argument ordering/selection in device-side put handling.
src/core/executor/executor.cc	Switches to per-channel connections (vector-based) and uses per-peer tag counters for stable matching.
python/test/executor_test.py	Adds sendrecv/double-buffer and split-mask support to benchmarking and correctness harness.
python/test/executor_test_verifier.cu	Updates fill/test kernels to accept `split_mask` and adds sendrecv verification kernels.
python/mscclpp/language/tests/multi_node/send_recv.py	New multi-node DSL test program for sendrecv with split grouping.
python/mscclpp/language/rank.py	Makes buffer slicing more robust by handling `slice(None, None)` and non-slice indexing.
python/mscclpp/language/collectives.py	Introduces `SendRecv` collective and buffer initialization for it.

build_bufs now always returns parallel lists of buffers (length 1 for normal collectives, length 2 for sendrecv double-buffering), so bench_time, bench_correctness, and main() no longer branch on sendrecv_mode or double_buf. Iteration i uses funcs[i % len(funcs)] uniformly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI and others added 30 commits February 11, 2026 00:12

Initial plan

e711b62

Replace gtest/gtest.h with framework.hpp in all unit tests

c881bc5

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Address code review comments

1e32e17

- Remove duplicate static getMPIRank() and getMPISize() functions - Add full namespace qualification to GTEST_SKIP macro Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Add --gtest_filter support to framework

3d8a2e7

Support --gtest_filter command line argument for test filtering, compatible with Azure pipeline configurations. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Address code review feedback

a10aff5

- Move PerfTestResult struct definition outside vector declaration - Move getCurrentTimestamp to anonymous namespace - Add documentation for GTEST_SKIP macro explaining RAII pattern Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Initial plan for fixing test build with GPU bypass

5657e4a

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Convert test framework identifiers from snake_case to camelCase

b1f458e

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Comprehensive plan for refactoring

6da12fa

Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

Remove build2 CMake artifacts from repository

ba0451a

Another set of accidentally committed build artifacts in build2/ directory. The .gitignore pattern build_*/ should prevent these in the future. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

simplifying

30b9891

simplify

b6ce0f2

coverage update

d2efc2f

minor

4afbf78

license text update

e40c72b

codecov upload

bed85b5

badge

4d9acea

lint issue

b693d1b

fix lint

2b4adcc

mahdiehghazim and others added 18 commits April 7, 2026 01:40

add barrier

3f2ade2

add extra signal/wait and avoid local flush

6d8fb00

add executor for testing

96defbd

revert dsl

68690ec

merge main

54c2f50

WIP

76fdd1d

WIP

57f7be6

WIP

65139d6

fix

456ef7e

WIP

36abcbe

for 4 nodes

a2a1b89

update the script

1fd5ed8

update

4a17b64

clean

3a1e2d4

revert

8a42fe2

WIP

7784407

WIP

e600520

Binyang2014 requested review from a team, caiomcbr and Copilot May 22, 2026 17:11

Copilot started reviewing on behalf of Binyang2014 May 22, 2026 17:11 View session

Merge branch 'main' into binyli/GB200

4e09967

Binyang2014 changed the title ~~Binyli/gb200~~ GB200 support: SendRecv DSL collective and per-channel executor connections May 22, 2026

Copilot AI reviewed May 22, 2026

View reviewed changes

Binyang2014 added 2 commits May 22, 2026 18:05

WIP

3bd24e1

WIP

142e794

Binyang2014 marked this pull request as ready for review May 22, 2026 18:12

Binyang2014 and others added 2 commits May 22, 2026 18:27

Merge branch 'main' into binyli/GB200

4463595

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GB200 support: SendRecv DSL collective and per-channel executor connections#810

GB200 support: SendRecv DSL collective and per-channel executor connections#810
Binyang2014 wants to merge 150 commits into
mainfrom
binyli/GB200

Binyang2014 commented May 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Binyang2014 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

DSL

Executor

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Binyang2014 commented May 20, 2026 •

edited

Loading