Skip to content

Extending DSL Executor CI tests for multi node envinronment#812

Open
caiomcbr wants to merge 4 commits into
mainfrom
caiorocha/multi_node_executor_test
Open

Extending DSL Executor CI tests for multi node envinronment#812
caiomcbr wants to merge 4 commits into
mainfrom
caiorocha/multi_node_executor_test

Conversation

@caiomcbr
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the multi-node CI pipeline to run MSCCL++ DSL executor tests in a 2-node environment by adding new executor-test execution plans (and their corresponding DSL generators) that exercise port-channel communication.

Changes:

  • Added a multi-node (2 ranks / 2 nodes) port-channel execution plan that covers SIGNAL/WAIT/PUT and fused put variants (Simple protocol).
  • Added a multi-node (2 ranks / 2 nodes) packet-based port-channel execution plan covering PUT_PACKETS / READ_PUT_PACKETS (LL protocol).
  • Updated the Azure multi-node pipeline to execute these new plans via mpirun as part of the CI job.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/executor-tests/execution-plans/multi_node_transfer.json New Simple-protocol multi-node execution plan to validate port-channel sync + data movement ops.
test/executor-tests/execution-plans/multi_node_transfer_pkt.json New LL-protocol packet execution plan for multi-node port-channel packet transfers.
test/executor-tests/algos/multi_node_transfer.py DSL generator used to produce the Simple multi-node transfer plan JSON.
test/executor-tests/algos/multi_node_transfer_pkt.py DSL generator used to produce the LL packet multi-node transfer plan JSON.
.azure-pipelines/multi-nodes-test.yml Runs the new executor tests on the 2-node VMSS via mpirun.

@@ -0,0 +1,239 @@
{
"name": "multi_node_transfer",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants