Add collective benchmark and correctness check by Binyang2014 · Pull Request #814 · microsoft/mscclpp

Binyang2014 · 2026-05-28T23:23:39Z

Add unit-test for float8_e4m3b15 data type.
And tuner and benchmark for allreduce/allgather algo, make sure the correctness and performance.

Copilot

Pull request overview

This PR introduces a Python-based collective benchmark + offline tuning workflow (with correctness checks), adds a GPU FP8 conversion unit test, and tightens/adjusts a couple of collective/runtime integration points (CI/docs/deps and kernel argument validation).

Changes:

Added python/mscclpp_benchmark utilities for benchmarking, offline tuning, GPU runtime abstraction (CUDA/HIP), and correctness checking (including FP8 handling).
Added a new CUDA unit test covering fp8_e4m3b15 encode/decode conversions and wired it into unit_tests.
Updated dependency extras/docs/CI to support the benchmark flow (CUDA Python bindings / hip-python) and added an allreduce packet launch-parameter check.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`test/unit/gpu_data_types_tests.cu`	New unit test validating `fp8_e4m3b15` conversion behavior.
`test/unit/CMakeLists.txt`	Adds the new GPU data type test to the unit test target.
`src/ext/collectives/allreduce/allreduce_packet.cu`	Adds validation that `nBlocks` is at least the local peer count.
`src/ext/collectives/allgather/allgather_fullmesh_2.cu`	Refactors worker participation/sync in the kernel (noted a correctness bug in review).
`python/requirements_rocm6.txt`	Fixes formatting and adds `hip-python>=6,<7`.
`python/requirements_cuda11.txt`	Adds CUDA Python bindings requirement for CUDA 11.
`python/requirements_cuda12.txt`	Adds CUDA Python bindings requirement for CUDA 12.
`python/requirements_cuda13.txt`	Adds CUDA Python bindings requirement for CUDA 13.
`python/mscclpp_benchmark/tuning_config.py`	New tuned-config parsing/selection/serialization utilities (noted a serialization bug in review).
`python/mscclpp_benchmark/tuner.py`	New offline tuner CLI driver to generate tuned configs.
`python/mscclpp_benchmark/gpu.py`	New CUDA/HIP runtime shim for graph capture/launch (cuda-bindings / hip-python).
`python/mscclpp_benchmark/correctness.py`	New correctness harness (including FP8 encoding/decoding and tolerances).
`python/mscclpp_benchmark/comm.py`	New raw-buffer Comm wrapper and default config resolution.
`python/mscclpp_benchmark/bench_collective.py`	New collective benchmark runner with optional autotuning + correctness gating.
`python/mscclpp_benchmark/__init__.py`	Switches to lazy attribute exports via `__getattr__`.
`pyproject.toml`	Updates platform extras to include `cuda-bindings` / `hip-python` and includes `mscclpp_benchmark` in wheel packages.
`include/mscclpp/gpu_data_types.hpp`	Adjusts HIP gfx942 f32→f16 packing path used by fp8_e4m3b15 conversion routing.
`docs/quickstart.md`	Documents new benchmark/tuning usage and updated extras behavior.
`.azure-pipelines/templates/rccl-test.yml`	Adds a Python benchmark step (noted robustness issue in review).
`.azure-pipelines/templates/nccl-test.yml`	Adds a Python benchmark step (noted robustness issue in review).

+    def write_path(self, path: str | Path) -> None:
+        profiles_payload: list[dict[str, Any]] = []
+        for profile, configs_by_collective in sorted(
+            ((profile, configs) for profile, configs in self._profiles.items() if profile is not None),
+            key=lambda item: (item[0].sku is None, item[0].sku or "", item[0].scale is None, item[0].scale or 0),
+        ):
+            collectives: dict[str, list[dict[str, Any]]] = {}
+            for collective, configs in sorted(configs_by_collective.items()):
+                collectives[collective] = [_config_entry_payload(item) for item in sorted(configs)]
+            profile_payload: dict[str, Any] = {}
+            if profile.sku is not None:
+                profile_payload["sku"] = profile.sku
+            if profile.scale is not None:
+                profile_payload["scale"] = profile.scale
+            profile_payload["collectives"] = collectives
+            profiles_payload.append(profile_payload)
+
+        with Path(path).open("w", encoding="utf-8") as handle:
+            handle.write(_format_tuned_config_json({"version": 1, "profiles": profiles_payload}))
+


+_FP8_TABLES: dict[str, list[tuple[int, float]]] = {}
+_FP8_SPACING_CACHE: dict[tuple[str, float], float] = {}
+
+
+def _encode_fp8_values(fp8_format: str, values):
+    values = values.astype(cp.float32)
+    if fp8_format == "e4m3b15":
+        return _encode_e4m3b15_values(values)
+
+    table = _FP8_TABLES.setdefault(fp8_format, _build_fp8_table(fp8_format))
+    table_bytes = cp.asarray([byte for byte, _ in table], dtype=cp.uint8)
+    table_values = cp.asarray([value for _, value in table], dtype=cp.float32)


Binyang2014 added 10 commits May 28, 2026 05:21

add benchmakr

4474359

update

cdb0383

WIP

0605022

WIP

b302796

WIP

2fe6b1e

update

f1a5a7d

update correctness check

44dab3b

remove some code

dc37dd6

fix issue

569acc3

add new test

ab567ef

Binyang2014 requested a review from Copilot May 29, 2026 20:20

Copilot started reviewing on behalf of Binyang2014 May 29, 2026 20:21 View session

Binyang2014 added 2 commits May 29, 2026 13:21

Merge branch 'main' into binyli/benchmark

ad97f72

update

c8a49fa

Copilot AI reviewed May 29, 2026

View reviewed changes

Binyang2014 added 4 commits May 29, 2026 21:11

WIP

493e3b3

update

ce03bae

WIP

f830639

WIP

fac9467

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add collective benchmark and correctness check#814

Add collective benchmark and correctness check#814
Binyang2014 wants to merge 16 commits into
mainfrom
binyli/benchmark

Binyang2014 commented May 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Binyang2014 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Binyang2014 commented May 28, 2026 •

edited

Loading