Primus v26.3 by gargrahul · Pull Request #159 · ROCm/MAD

gargrahul · 2026-05-20T21:23:06Z

Primus v26.3 release introduces the following new models and upgrades previously supported models

Qwen 3 30B BF16/FP8
Qwen 3 235B BF16/FP8
GPT OSS 20B BF16/FP8
GPT OSS 120B BF16/FP8

Primus v26.3 release introduces the following new models and upgrades previously supported models Qwen 3 30B BF16/FP8 Qwen 3 235B BF16/FP8 GPT OSS 20B BF16/FP8 GPT OSS 120B BF16/FP8

Copilot

Pull request overview

Updates MAD’s Primus integration to align with the Primus v26.3 release, adding new Megatron-LM training targets (Qwen3 30B/235B, GPT-OSS 20B/120B) and bumping Docker base images/documentation to the newer container stack.

Changes:

Add new Primus Megatron-LM model repos + datatype support logic (BF16/FP8 where applicable), and extend benchmark parsing to recognize these models.
Introduce a setup-time patch step to add training-log metrics summarization to Primus’ primus-cli-direct.sh.
Bump training Docker base images to rocm/primus:v26.3 and refresh benchmark README component versions.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
scripts/pytorch_train/run.sh	Ensure certain post-train model repos explicitly run with BF16.
scripts/primus/pytorch_train/primus_pytorch_benchmark_report.sh	Adjust device→config mapping logic for pretrain benchmarks.
scripts/primus/megatron-lm/run.sh	Add new model repo selectors + datatype support matrix updates.
scripts/primus/megatron-lm/primus_megatron-lm_benchmark_setup.sh	Apply an inline patch to Primus to parse/append training metrics summaries.
scripts/primus/megatron-lm/primus_megatron-lm_benchmark_report.sh	Add new models and refine benchmark execution behaviors.
scripts/primus/megatron-lm/primus_megatron-lm_benchmark_report.py	Extend log parsing eligibility list for new models.
models.json	Register new model repos for MAD runs (including skip-arch for GPT-OSS-120B).
docker/pytorch_train.ubuntu.amd.Dockerfile	Bump base image to `rocm/primus:v26.3`.
docker/primus_pytorch_train.ubuntu.amd.Dockerfile	Bump base image to `rocm/primus:v26.3`.
docker/primus_megatron_train.ubuntu.amd.Dockerfile	Bump base image to `rocm/primus:v26.3`.
benchmark/pytorch_train/README.md	Update component versions listed for the training container.
benchmark/megatron_lm/README.md	Update component versions and document some new supported models + examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

added examples for multi-node training of mixtral 8x22B and llama3.1-405B. Also made some other changes.

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

added multi-node training examples

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

accept Copilot suggestions

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

vidushi8

LGTM

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

amd-fuyuajin

Made Copilot suggested changes. Added multi-node training examples. Ready to merge.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

peterjunpark · 2026-05-21T18:38:11Z

   ```

-The docker container hosts verified coomit `e16b27b` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902).
+The docker container hosts verified commit `e16b27b` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902).


Suggested change

The docker container hosts verified commit `e16b27b` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902).

The docker container hosts verified commit `43a6e0` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/release/v26.3).

Should this be 43a6e006c419697208295c5523b99070e8198ad9? That's the head of the release branch https://github.com/AMD-AGI/Primus/commits/release/v26.3/

peterjunpark · 2026-05-21T18:38:35Z

-## 2. Configurations in Yaml Script (`‎examples/megatron/configs/`)
+## 2. Configurations in yaml files (`‎examples/megatron/configs/`)

 Primus defines training yaml for each model inside [‎examples/megatron/configs/](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902/examples/megatron/configs) repository. For example, use `examples/megatron/configs/llama3.1_8B-pretrain.yaml` for updating llama3.1_8B training parameters. Other yaml for the supported model can be found with `examples/megatron/configs/${MODEL_NAME}-pretrain.yaml` naming convention in this repository.


Suggested change

Primus defines training yaml for each model inside [‎examples/megatron/configs/](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902/examples/megatron/configs) repository. For example, use `examples/megatron/configs/llama3.1_8B-pretrain.yaml` for updating llama3.1_8B training parameters. Other yaml for the supported model can be found with `examples/megatron/configs/${MODEL_NAME}-pretrain.yaml` naming convention in this repository.

Primus defines training yaml for each model inside [‎examples/megatron/configs/](https://github.com/AMD-AGI/Primus/tree/release/v26.3) repository. For example, use `examples/megatron/configs/llama3.1_8B-pretrain.yaml` for updating llama3.1_8B training parameters. Other yaml for the supported model can be found with `examples/megatron/configs/${MODEL_NAME}-pretrain.yaml` naming convention in this repository.

Primus v26.3

e15bcad

Primus v26.3 release introduces the following new models and upgrades previously supported models Qwen 3 30B BF16/FP8 Qwen 3 235B BF16/FP8 GPT OSS 20B BF16/FP8 GPT OSS 120B BF16/FP8

Copilot AI review requested due to automatic review settings May 20, 2026 21:23

gargrahul requested review from Rohan138, amathews-amd, coketaste and ppalaniappan-amd as code owners May 20, 2026 21:23

Copilot started reviewing on behalf of gargrahul May 20, 2026 21:23 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

amd-fuyuajin added 2 commits May 20, 2026 21:28

Update README.md

bab6b43

added examples for multi-node training of mixtral 8x22B and llama3.1-405B. Also made some other changes.

Update README.md

72832ff

Copilot AI review requested due to automatic review settings May 21, 2026 01:35

Copilot AI reviewed May 21, 2026

View reviewed changes

amd-fuyuajin added 2 commits May 20, 2026 21:37

Update README.md

496ded5

Update README.md

9d8c582

added multi-node training examples

Copilot AI review requested due to automatic review settings May 21, 2026 02:00

Copilot AI reviewed May 21, 2026

View reviewed changes

amd-fuyuajin requested review from amd-fuyuajin and vidushi8 May 21, 2026 15:20

amd-fuyuajin added 2 commits May 21, 2026 11:45

Update primus_megatron-lm_benchmark_setup.sh

1f24e65

accept Copilot suggestions

Update primus_megatron-lm_benchmark_report.sh

72b1b01

Copilot AI review requested due to automatic review settings May 21, 2026 16:22

Copilot AI reviewed May 21, 2026

View reviewed changes

vidushi8 added 2 commits May 21, 2026 09:25

Update README.md with multinode and remove proxy models

196a51c

Update README.md torchtitan with multinode examples

f8d1c66

Copilot AI review requested due to automatic review settings May 21, 2026 16:35

Copilot started reviewing on behalf of vidushi8 May 21, 2026 16:35 View session

typo fix megatron README.md

444ad5e

vidushi8 previously approved these changes May 21, 2026

View reviewed changes

Copilot AI reviewed May 21, 2026

View reviewed changes

vidushi8 requested a review from clairesonglee May 21, 2026 16:40

amd-fuyuajin previously approved these changes May 21, 2026

View reviewed changes

Potential fix for pull request finding

e5ad91f

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

amd-fuyuajin dismissed stale reviews from vidushi8 and themself via e5ad91f May 21, 2026 17:03

Copilot AI review requested due to automatic review settings May 21, 2026 17:03

Copilot AI reviewed May 21, 2026

View reviewed changes

amd-fuyuajin added 2 commits May 21, 2026 13:16

Update README.md

f075693

Update primus_megatron-lm_benchmark_setup.sh

6d91596

Copilot AI review requested due to automatic review settings May 21, 2026 17:18

Copilot AI reviewed May 21, 2026

View reviewed changes

Update primus_megatron-lm_benchmark_setup.sh

924193d

peterjunpark reviewed May 21, 2026

View reviewed changes

gargrahul merged commit 1b4ec50 into ROCm:develop May 23, 2026

	The docker container hosts verified commit `e16b27b` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902).
	The docker container hosts verified commit `43a6e0` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/release/v26.3).

Conversation

gargrahul commented May 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

vidushi8 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amd-fuyuajin left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

peterjunpark May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterjunpark May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

peterjunpark May 21, 2026 •

edited

Loading

peterjunpark May 21, 2026 •

edited

Loading