Add Gemma 4 12B Unified Olive recipe (mobius) by justinchuby · Pull Request #503 · microsoft/olive-recipes

justinchuby · 2026-06-12T16:26:47Z

Summary

Adds an Olive recipe for google/gemma-4-12B-it — the encoder-free Unified member of the Gemma 4 family — mirroring the existing google-gemma-4-E2B-it recipe.

The model is exported to ONNX via the MobiusBuilder pass and optionally quantized with K-Quant (Q4_K_M) INT4. Gemma 4 12B Unified projects raw image patches and audio waveform features directly into the decoder embedding space (no dedicated encoders), but the mobius pipeline still emits four ORT GenAI components: decoder, embedding, and encoder-free vision_encoder / audio_encoder embedders.

Recipes

Recipe	Pipeline	Device / EP
`cpu/fp32/config.json`	`MobiusBuilder(fp32)`	CPU
`cpu/int4/config.json`	`MobiusBuilder(fp32)` → `OnnxKQuantQuantization`	CPU
`cuda/fp16/config.json`	`MobiusBuilder(fp16)`	CUDA
`cuda/int4/config.json`	`MobiusBuilder(fp16)` → `OnnxKQuantQuantization`	CUDA

Validation

All config JSON and info.yml parse; the repo scanner groups the 4 recipes correctly by arch/ep/device.
eval.py/inference.py byte-compile.
Model id confirmed on HF (model_type: gemma4_unified, ungated); supported by mobius Gemma4UnifiedModel / Gemma4UnifiedTask.

Generated model artifacts and .olive-cache are intentionally not committed (gitignored), matching other large-model recipes (e.g. Qwen3-14B).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Add an Olive recipe for google/gemma-4-12B-it, the encoder-free Unified member of the Gemma 4 family. Mirrors the existing gemma-4-E2B-it recipe: exports to ONNX via the MobiusBuilder pass and optionally quantizes the decoder with K-Quant (Q4_K_M) INT4. Four configs cover CPU (fp32, int4) and CUDA (fp16, int4). Includes info.yml, requirements, README documenting the encoder-free 4-component pipeline (decoder, embedding, encoder-free vision/audio embedders), plus eval.py (MMLU Pro 77.2% / GPQA Diamond 78.8% reference scores) and a text inference script. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>

Copilot

Pull request overview

Adds a new Olive “mobius” recipe set for Gemma 4 12B Unified (google/gemma-4-12B-it), mirroring the existing Gemma 4 E2B recipe structure to export ORT GenAI components via MobiusBuilder and optionally apply K-Quant INT4.

Changes:

Introduces CPU (fp32/int4) and CUDA (fp16/int4) Olive configs targeting Mobius export + optional K-Quant.
Adds runnable helper scripts for ORT GenAI inference (inference.py) and lm-eval-harness evaluation (eval.py).
Adds recipe metadata (info.yml), documentation (README.md), and dependency list (requirements.txt).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
google-gemma-4-12B-it/requirements.txt	Declares Python deps for running eval/inference helpers.
google-gemma-4-12B-it/README.md	Documents model/recipe intent, build steps, inference, and evaluation usage.
google-gemma-4-12B-it/LICENSE	Adds the recipe folder license text.
google-gemma-4-12B-it/info.yml	Registers recipe metadata for repo scanning and indexing.
google-gemma-4-12B-it/inference.py	Provides ORT GenAI text inference CLI for produced model packages.
google-gemma-4-12B-it/eval.py	Provides lm-eval-harness evaluation CLI for produced model packages.
google-gemma-4-12B-it/cpu/fp32/config.json	CPU fp32 MobiusBuilder config.
google-gemma-4-12B-it/cpu/int4/config.json	CPU fp32 MobiusBuilder + K-Quant INT4 config.
google-gemma-4-12B-it/cuda/fp16/config.json	CUDA fp16 MobiusBuilder config with CUDA EP target.
google-gemma-4-12B-it/cuda/int4/config.json	CUDA fp16 MobiusBuilder + K-Quant INT4 config with CUDA EP target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Resolve model dirs relative to __file__ in eval.py/inference.py so the scripts work from any working directory (not just the recipe folder). - Align eval.py usage docstring with the default task key (leaderboard_mmlu_pro). - README: use block_size=32 (matching the JSON configs) and drop the redundant explicit pip install in favor of requirements.txt. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>

+from pathlib import Path
+
+# Register Olive's ORT GenAI evaluator with lm-eval
+import olive.evaluator.lmeval_ort  # noqa: F401


For the encoder-free gemma4_unified architecture, each of the vision and audio 'encoders' is a single projector MatMul that forms the entire image/audio embedding pathway. Quantizing it to INT4 injects disproportionate error (measured rel-L2 ~3.7% vision / ~9.2% audio) while the components are tiny (~76 MB / ~1.4 MB), so keeping them FP16 costs almost nothing. Exclude them via nodes_to_exclude: ['*/projector/*']. The decoder (including lm_head) stays INT4, where the size savings live and INT4 has negligible impact on output tokens (top-1 logit agreement ~100%, KL~0.004). The glob form of nodes_to_exclude requires microsoft/Olive#2518; with older Olive the pattern matches nothing and projectors are quantized as before. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 12, 2026 16:26

Copilot started reviewing on behalf of justinchuby June 12, 2026 16:27 View session

github-code-quality Bot found potential problems Jun 12, 2026

View reviewed changes

Comment thread google-gemma-4-12B-it/eval.py Fixed

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread google-gemma-4-12B-it/inference.py Outdated

Comment thread google-gemma-4-12B-it/README.md Outdated

Comment thread google-gemma-4-12B-it/README.md Outdated

Comment thread google-gemma-4-12B-it/eval.py

Comment thread google-gemma-4-12B-it/eval.py

github-code-quality Bot found potential problems Jun 12, 2026

View reviewed changes

Comment thread google-gemma-4-12B-it/eval.py

from pathlib import Path

# Register Olive's ORT GenAI evaluator with lm-eval

import olive.evaluator.lmeval_ort # noqa: F401

justinchuby enabled auto-merge (squash) June 24, 2026 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Gemma 4 12B Unified Olive recipe (mobius)#503

Add Gemma 4 12B Unified Olive recipe (mobius)#503
justinchuby wants to merge 3 commits into
mainfrom
justinchu/gemma4-12b-recipe

justinchuby commented Jun 12, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

justinchuby commented Jun 12, 2026

Summary

Recipes

Contents

Validation

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants