Skip to content

Upgrade mscclpp to v0.9.0 + multi-GPU device-pinning fixes#247

Merged
chhwang merged 2 commits into
mainfrom
pr1-mscclpp-v090-upgrade
May 25, 2026
Merged

Upgrade mscclpp to v0.9.0 + multi-GPU device-pinning fixes#247
chhwang merged 2 commits into
mainfrom
pr1-mscclpp-v090-upgrade

Conversation

@chhwang
Copy link
Copy Markdown
Contributor

@chhwang chhwang commented May 25, 2026

  • Bump mscclpp pin from 7f3b0887 to v0.9.0 (225 commits ahead)
  • Migrate executor.cpp to v0.9.0 Communicator API: connectOnSetup -> connect(EndpointConfig), SmDevice2DeviceSemaphore -> MemoryDevice2DeviceSemaphore, MemoryChannel ctor takes RegisteredMemory not raw ptr
  • Add set_current() in Executor::compile and PlanResource::init_kernel to pin CUDA context before per-device allocations (fixes cudaErrorInvalidValue at TP>=4)
  • Update executor pybind for v0.9.0 API changes
  • Fix GPU architecture detection in cmake for newer CUDA toolkits
  • Update CI workflow

chhwang added 2 commits May 25, 2026 18:46
- Bump mscclpp pin from 7f3b0887 to v0.9.0 (225 commits ahead)
- Migrate executor.cpp to v0.9.0 Communicator API:
  connectOnSetup -> connect(EndpointConfig),
  SmDevice2DeviceSemaphore -> MemoryDevice2DeviceSemaphore,
  MemoryChannel ctor takes RegisteredMemory not raw ptr
- Add set_current() in Executor::compile and PlanResource::init_kernel
  to pin CUDA context before per-device allocations (fixes
  cudaErrorInvalidValue at TP>=4)
- Update executor pybind for v0.9.0 API changes
- Fix GPU architecture detection in cmake for newer CUDA toolkits
- Update CI workflow
…-review

- Replace volatile pointer casts with mscclpp::atomicLoad/atomicStore
  from atomic_device.hpp for host-side flag polling
- Remove .github/workflows/ut.yml (CI not runnable currently)
- Add unit tests: test_executor_device_pinning (ReLU smoke),
  test_executor_recompile_cycle, test_executor_flag_polling
- Fix Python max_spin_count default (100000000 -> -1 = infinite)
- Move timer_begin_->record() after set_current() in launch()
- Condense verbose set_current() comments per deep-review feedback
- Clarify test comments: single-GPU smoke, not multi-GPU verification
@chhwang chhwang marked this pull request as ready for review May 25, 2026 21:35
@chhwang chhwang merged commit 808f205 into main May 25, 2026
5 of 11 checks passed
@chhwang chhwang deleted the pr1-mscclpp-v090-upgrade branch May 25, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant