Skip to content

feat(vllm): add Gemma 4 models, image, and ROCm serving recipes#144

Open
coketaste wants to merge 8 commits into
ROCm:developfrom
coketaste:coketaste/gemma4
Open

feat(vllm): add Gemma 4 models, image, and ROCm serving recipes#144
coketaste wants to merge 8 commits into
ROCm:developfrom
coketaste:coketaste/gemma4

Conversation

@coketaste
Copy link
Copy Markdown
Contributor

  • Register pyt_vllm_gemma-4-26b-a4b-it and pyt_vllm_gemma-4-31b-it in models.json (gemma4 Docker stack).
  • Add docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile from vllm/vllm-openai-rocm:gemma4 with transformers 5.5.0.
  • Extend scripts/vllm/configs/default.yaml with Gemma 4 serving blocks (TRITON_ATTN, gfx942 float16; 26B MoE disables AITER fused MoE).
  • Quote JSON-like extra_args in run_vllm.py (shlex) for --limit-mm-per-prompt with existing --flag YAML keys.
  • Document Gemma 4 in benchmark/vllm/README.md.

- Register pyt_vllm_gemma-4-26b-a4b-it and pyt_vllm_gemma-4-31b-it in models.json (gemma4 Docker stack).
- Add docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile from vllm/vllm-openai-rocm:gemma4 with transformers 5.5.0.
- Extend scripts/vllm/configs/default.yaml with Gemma 4 serving blocks (TRITON_ATTN, gfx942 float16; 26B MoE disables AITER fused MoE).
- Quote JSON-like extra_args in run_vllm.py (shlex) for --limit-mm-per-prompt with existing --flag YAML keys.
- Document Gemma 4 in benchmark/vllm/README.md.
@coketaste coketaste requested a review from gargrahul as a code owner April 14, 2026 18:43
Copilot AI review requested due to automatic review settings April 14, 2026 18:43
@coketaste coketaste self-assigned this Apr 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Gemma 4 (26B-A4B-it and 31B-it) vLLM serving support to the MAD benchmarking stack, including new model registrations, ROCm/Gemma4 Docker build plumbing, and documented serving recipes.

Changes:

  • Registered two Gemma 4 vLLM models in models.json and documented them in benchmark/vllm/README.md.
  • Added a Gemma4-specific AMD Ubuntu Dockerfile based on vllm/vllm-openai-rocm:gemma4 and extended scripts/vllm/configs/default.yaml with Gemma 4 serving recipes/overrides.
  • Updated scripts/vllm/run_vllm.py to shell-quote JSON-like/whitespace-containing extra_args values (notably --limit-mm-per-prompt).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/vllm/run_vllm.py Adjusts extra_args formatting/quoting when composing the vLLM command line.
scripts/vllm/configs/default.yaml Adds Gemma 4 serving benchmark blocks and gfx942 dtype overrides.
models.json Registers Gemma 4 vLLM models and their MAD metadata/output CSV names.
docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile Introduces a Gemma4-tagged base image Dockerfile and pins transformers.
benchmark/vllm/README.md Documents Gemma 4 image tag usage, gating/token requirements, and recipe details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile Outdated
Comment thread scripts/vllm/run_vllm.py Outdated
coketaste and others added 5 commits April 30, 2026 17:10
…rmers>=5.5.0

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
…artial)

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 21:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Gemma 4 vLLM benchmark/serving support and hardens run_vllm.py extra-args handling to better support JSON-like and shell-metacharacter-containing values.

Changes:

  • Registers Gemma 4 models in models.json and adds serving recipes to scripts/vllm/configs/default.yaml.
  • Updates the shared vLLM AMD Dockerfile to a newer upstream vLLM base image and installs newer Transformers.
  • Switches run_vllm.py to shell-quote extra arg values and adds a new test module + README updates describing Gemma 4 usage.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/vllm/test_run_vllm_extra_args.py Adds tests intended to validate extra-args quoting behavior.
scripts/vllm/run_vllm.py Quotes extra args via shlex.quote(str(v)) when building shell command strings.
scripts/vllm/configs/default.yaml Adds Gemma 4 serving blocks (TRITON_ATTN, gfx942 float16 overrides; MoE AITER disable for 26B-A4B).
models.json Registers pyt_vllm_gemma-4-26b-a4b-it and pyt_vllm_gemma-4-31b-it.
docker/pyt_vllm.ubuntu.amd.Dockerfile Bumps vLLM base image version and installs newer Transformers in the shared vLLM stack.
benchmark/vllm/README.md Documents Gemma 4 images/recipes and updates the available-models list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/vllm/test_run_vllm_extra_args.py Outdated
Comment thread scripts/vllm/run_vllm.py Outdated
Comment thread docker/pyt_vllm.ubuntu.amd.Dockerfile Outdated
Comment thread models.json
Comment thread benchmark/vllm/README.md Outdated
Comment thread tests/vllm/test_run_vllm_extra_args.py Outdated
coketaste and others added 2 commits May 1, 2026 02:41
- Remove redundant pip install transformers (v0.20.0 ships with v5)
- Delete test_run_vllm_extra_args.py (duplicated inline logic)
- Remove --async-scheduling from Gemma 4 configs (on by default)
- Enable concurrency 32/128 for gemma-4-26B-A4B-it
- Update README to reflect v0.20.0 as the standard base image

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…helper

- Fix ENTRYPOINT [""] → ENTRYPOINT [] to properly clear upstream entrypoint
- Skip bool False flags instead of emitting them on the command line
- Extract build_extra_args_str() as importable module-level function
- Rewrite tests to import and exercise the real production code path

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 1, 2026 03:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the vLLM benchmarking/serving integration to support Google Gemma 4 models by registering new MAD model entries, adding serving recipes, tightening CLI extra-arg quoting in the vLLM runner, and updating the vLLM Docker base tag and documentation accordingly.

Changes:

  • Register Gemma 4 models in models.json and document usage/requirements in benchmark/vllm/README.md.
  • Add Gemma 4 serving recipes (including AITER/MoE and gfx942 overrides) to scripts/vllm/configs/default.yaml.
  • Refactor/strengthen extra_args shell-quoting via build_extra_args_str() with unit tests; bump vLLM ROCm base image tag and clear the entrypoint in the shared vLLM Dockerfile.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/vllm/test_run_vllm_extra_args.py Adds unit tests for build_extra_args_str() to validate quoting/flag behavior.
tests/vllm/__init__.py Initializes the tests.vllm package (empty).
scripts/vllm/run_vllm.py Introduces build_extra_args_str() using shlex.quote and switches main config processing to use it.
scripts/vllm/configs/default.yaml Adds Gemma 4 serving config blocks with TRITON_ATTN, gfx942 float16 override, and MoE/AITER controls.
models.json Registers pyt_vllm_gemma-4-26b-a4b-it and pyt_vllm_gemma-4-31b-it.
docker/pyt_vllm.ubuntu.amd.Dockerfile Updates base image to v0.20.0 and clears the entrypoint via ENTRYPOINT [].
benchmark/vllm/README.md Updates vLLM version/tag references and documents Gemma 4 models + recipes and required env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/vllm/run_vllm.py
@@ -485,12 +499,7 @@ def main():
env_vars = config.get("env", {})
extra_args = config.get("extra_args", {})
env_vars_str = " ".join(f"{k}={v}" for k, v in env_vars.items())
Comment on lines 27 to +39
@@ -36,4 +36,4 @@ WORKDIR $WORKSPACE_DIR
RUN pip3 list

# Specify entrypoint to override upstream
ENTRYPOINT [""]
ENTRYPOINT []
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants