feat(vllm): add Gemma 4 models, image, and ROCm serving recipes by coketaste · Pull Request #144 · ROCm/MAD

coketaste · 2026-04-14T18:43:52Z

Register pyt_vllm_gemma-4-26b-a4b-it and pyt_vllm_gemma-4-31b-it in models.json (gemma4 Docker stack).
Add docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile from vllm/vllm-openai-rocm:gemma4 with transformers 5.5.0.
Extend scripts/vllm/configs/default.yaml with Gemma 4 serving blocks (TRITON_ATTN, gfx942 float16; 26B MoE disables AITER fused MoE).
Quote JSON-like extra_args in run_vllm.py (shlex) for --limit-mm-per-prompt with existing --flag YAML keys.
Document Gemma 4 in benchmark/vllm/README.md.

- Register pyt_vllm_gemma-4-26b-a4b-it and pyt_vllm_gemma-4-31b-it in models.json (gemma4 Docker stack). - Add docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile from vllm/vllm-openai-rocm:gemma4 with transformers 5.5.0. - Extend scripts/vllm/configs/default.yaml with Gemma 4 serving blocks (TRITON_ATTN, gfx942 float16; 26B MoE disables AITER fused MoE). - Quote JSON-like extra_args in run_vllm.py (shlex) for --limit-mm-per-prompt with existing --flag YAML keys. - Document Gemma 4 in benchmark/vllm/README.md.

Copilot

Pull request overview

Adds Gemma 4 (26B-A4B-it and 31B-it) vLLM serving support to the MAD benchmarking stack, including new model registrations, ROCm/Gemma4 Docker build plumbing, and documented serving recipes.

Changes:

Registered two Gemma 4 vLLM models in models.json and documented them in benchmark/vllm/README.md.
Added a Gemma4-specific AMD Ubuntu Dockerfile based on vllm/vllm-openai-rocm:gemma4 and extended scripts/vllm/configs/default.yaml with Gemma 4 serving recipes/overrides.
Updated scripts/vllm/run_vllm.py to shell-quote JSON-like/whitespace-containing extra_args values (notably --limit-mm-per-prompt).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
scripts/vllm/run_vllm.py	Adjusts `extra_args` formatting/quoting when composing the vLLM command line.
scripts/vllm/configs/default.yaml	Adds Gemma 4 serving benchmark blocks and gfx942 dtype overrides.
models.json	Registers Gemma 4 vLLM models and their MAD metadata/output CSV names.
docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile	Introduces a Gemma4-tagged base image Dockerfile and pins transformers.
benchmark/vllm/README.md	Documents Gemma 4 image tag usage, gating/token requirements, and recipe details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…rmers>=5.5.0 Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

…artial) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Copilot

Pull request overview

Adds Gemma 4 vLLM benchmark/serving support and hardens run_vllm.py extra-args handling to better support JSON-like and shell-metacharacter-containing values.

Changes:

Registers Gemma 4 models in models.json and adds serving recipes to scripts/vllm/configs/default.yaml.
Updates the shared vLLM AMD Dockerfile to a newer upstream vLLM base image and installs newer Transformers.
Switches run_vllm.py to shell-quote extra arg values and adds a new test module + README updates describing Gemma 4 usage.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`tests/vllm/test_run_vllm_extra_args.py`	Adds tests intended to validate extra-args quoting behavior.
`scripts/vllm/run_vllm.py`	Quotes extra args via `shlex.quote(str(v))` when building shell command strings.
`scripts/vllm/configs/default.yaml`	Adds Gemma 4 serving blocks (TRITON_ATTN, gfx942 float16 overrides; MoE AITER disable for 26B-A4B).
`models.json`	Registers `pyt_vllm_gemma-4-26b-a4b-it` and `pyt_vllm_gemma-4-31b-it`.
`docker/pyt_vllm.ubuntu.amd.Dockerfile`	Bumps vLLM base image version and installs newer Transformers in the shared vLLM stack.
`benchmark/vllm/README.md`	Documents Gemma 4 images/recipes and updates the available-models list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove redundant pip install transformers (v0.20.0 ships with v5) - Delete test_run_vllm_extra_args.py (duplicated inline logic) - Remove --async-scheduling from Gemma 4 configs (on by default) - Enable concurrency 32/128 for gemma-4-26B-A4B-it - Update README to reflect v0.20.0 as the standard base image Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

…helper - Fix ENTRYPOINT [""] → ENTRYPOINT [] to properly clear upstream entrypoint - Skip bool False flags instead of emitting them on the command line - Extract build_extra_args_str() as importable module-level function - Rewrite tests to import and exercise the real production code path Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot

Pull request overview

This PR extends the vLLM benchmarking/serving integration to support Google Gemma 4 models by registering new MAD model entries, adding serving recipes, tightening CLI extra-arg quoting in the vLLM runner, and updating the vLLM Docker base tag and documentation accordingly.

Changes:

Register Gemma 4 models in models.json and document usage/requirements in benchmark/vllm/README.md.
Add Gemma 4 serving recipes (including AITER/MoE and gfx942 overrides) to scripts/vllm/configs/default.yaml.
Refactor/strengthen extra_args shell-quoting via build_extra_args_str() with unit tests; bump vLLM ROCm base image tag and clear the entrypoint in the shared vLLM Dockerfile.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`tests/vllm/test_run_vllm_extra_args.py`	Adds unit tests for `build_extra_args_str()` to validate quoting/flag behavior.
`tests/vllm/__init__.py`	Initializes the `tests.vllm` package (empty).
`scripts/vllm/run_vllm.py`	Introduces `build_extra_args_str()` using `shlex.quote` and switches main config processing to use it.
`scripts/vllm/configs/default.yaml`	Adds Gemma 4 serving config blocks with TRITON_ATTN, gfx942 float16 override, and MoE/AITER controls.
`models.json`	Registers `pyt_vllm_gemma-4-26b-a4b-it` and `pyt_vllm_gemma-4-31b-it`.
`docker/pyt_vllm.ubuntu.amd.Dockerfile`	Updates base image to `v0.20.0` and clears the entrypoint via `ENTRYPOINT []`.
`benchmark/vllm/README.md`	Updates vLLM version/tag references and documents Gemma 4 models + recipes and required env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -485,12 +499,7 @@ def main():
            env_vars = config.get("env", {})
            extra_args = config.get("extra_args", {})
            env_vars_str = " ".join(f"{k}={v}" for k, v in env_vars.items())


@@ -36,4 +36,4 @@ WORKDIR $WORKSPACE_DIR
 RUN pip3 list

 # Specify entrypoint to override upstream
-ENTRYPOINT [""]
+ENTRYPOINT []


coketaste requested a review from gargrahul as a code owner April 14, 2026 18:43

Copilot AI review requested due to automatic review settings April 14, 2026 18:43

coketaste requested review from Rohan138, amathews-amd and ppalaniappan-amd as code owners April 14, 2026 18:43

coketaste self-assigned this Apr 14, 2026

Copilot started reviewing on behalf of coketaste April 14, 2026 18:45 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile Outdated

Comment thread scripts/vllm/run_vllm.py Outdated

coketaste and others added 5 commits April 30, 2026 17:10

Merge branch 'ROCm:develop' into coketaste/gemma4

912aebc

chore(vllm): bump base image to vllm-openai-rocm v0.20.0, pin transfo…

b4de892

…rmers>=5.5.0 Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

feat(vllm): consolidate Gemma 4 models into standard pyt_vllm stack

77f0161

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

docs(vllm): update README — Gemma 4 now uses standard pyt_vllm image

76bc276

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

fix(vllm): apply shlex.quote to all non-bool extra_args values (was p…

b455096

…artial) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 30, 2026 21:30

Copilot started reviewing on behalf of coketaste April 30, 2026 21:32 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

coketaste and others added 2 commits May 1, 2026 02:41

Copilot AI review requested due to automatic review settings May 1, 2026 03:01

Copilot started reviewing on behalf of coketaste May 1, 2026 03:02 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vllm): add Gemma 4 models, image, and ROCm serving recipes#144

feat(vllm): add Gemma 4 models, image, and ROCm serving recipes#144
coketaste wants to merge 8 commits into
ROCm:developfrom
coketaste:coketaste/gemma4

coketaste commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

coketaste commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants