Skip to content

Add Gemma 4 12B Unified Olive recipe (mobius)#503

Open
justinchuby wants to merge 3 commits into
mainfrom
justinchu/gemma4-12b-recipe
Open

Add Gemma 4 12B Unified Olive recipe (mobius)#503
justinchuby wants to merge 3 commits into
mainfrom
justinchu/gemma4-12b-recipe

Conversation

@justinchuby

Copy link
Copy Markdown
Contributor

Summary

Adds an Olive recipe for google/gemma-4-12B-it — the encoder-free Unified member of the Gemma 4 family — mirroring the existing google-gemma-4-E2B-it recipe.

The model is exported to ONNX via the MobiusBuilder pass and optionally quantized with K-Quant (Q4_K_M) INT4. Gemma 4 12B Unified projects raw image patches and audio waveform features directly into the decoder embedding space (no dedicated encoders), but the mobius pipeline still emits four ORT GenAI components: decoder, embedding, and encoder-free vision_encoder / audio_encoder embedders.

Recipes

Recipe Pipeline Device / EP
cpu/fp32/config.json MobiusBuilder(fp32) CPU
cpu/int4/config.json MobiusBuilder(fp32)OnnxKQuantQuantization CPU
cuda/fp16/config.json MobiusBuilder(fp16) CUDA
cuda/int4/config.json MobiusBuilder(fp16)OnnxKQuantQuantization CUDA

Contents

  • info.yml, requirements.txt, README.md
  • 4 Olive configs (CPU fp32/int4, CUDA fp16/int4)
  • eval.py (lm-eval; reference MMLU Pro 77.2%, GPQA Diamond 78.8%) and inference.py (ORT GenAI text inference)

Validation

  • All config JSON and info.yml parse; the repo scanner groups the 4 recipes correctly by arch/ep/device.
  • eval.py/inference.py byte-compile.
  • Model id confirmed on HF (model_type: gemma4_unified, ungated); supported by mobius Gemma4UnifiedModel / Gemma4UnifiedTask.

Generated model artifacts and .olive-cache are intentionally not committed (gitignored), matching other large-model recipes (e.g. Qwen3-14B).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Add an Olive recipe for google/gemma-4-12B-it, the encoder-free Unified
member of the Gemma 4 family. Mirrors the existing gemma-4-E2B-it recipe:
exports to ONNX via the MobiusBuilder pass and optionally quantizes the
decoder with K-Quant (Q4_K_M) INT4.

Four configs cover CPU (fp32, int4) and CUDA (fp16, int4). Includes
info.yml, requirements, README documenting the encoder-free 4-component
pipeline (decoder, embedding, encoder-free vision/audio embedders), plus
eval.py (MMLU Pro 77.2% / GPQA Diamond 78.8% reference scores) and a
text inference script.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 12, 2026 16:26
Comment thread google-gemma-4-12B-it/eval.py Fixed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Olive “mobius” recipe set for Gemma 4 12B Unified (google/gemma-4-12B-it), mirroring the existing Gemma 4 E2B recipe structure to export ORT GenAI components via MobiusBuilder and optionally apply K-Quant INT4.

Changes:

  • Introduces CPU (fp32/int4) and CUDA (fp16/int4) Olive configs targeting Mobius export + optional K-Quant.
  • Adds runnable helper scripts for ORT GenAI inference (inference.py) and lm-eval-harness evaluation (eval.py).
  • Adds recipe metadata (info.yml), documentation (README.md), and dependency list (requirements.txt).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
google-gemma-4-12B-it/requirements.txt Declares Python deps for running eval/inference helpers.
google-gemma-4-12B-it/README.md Documents model/recipe intent, build steps, inference, and evaluation usage.
google-gemma-4-12B-it/LICENSE Adds the recipe folder license text.
google-gemma-4-12B-it/info.yml Registers recipe metadata for repo scanning and indexing.
google-gemma-4-12B-it/inference.py Provides ORT GenAI text inference CLI for produced model packages.
google-gemma-4-12B-it/eval.py Provides lm-eval-harness evaluation CLI for produced model packages.
google-gemma-4-12B-it/cpu/fp32/config.json CPU fp32 MobiusBuilder config.
google-gemma-4-12B-it/cpu/int4/config.json CPU fp32 MobiusBuilder + K-Quant INT4 config.
google-gemma-4-12B-it/cuda/fp16/config.json CUDA fp16 MobiusBuilder config with CUDA EP target.
google-gemma-4-12B-it/cuda/int4/config.json CUDA fp16 MobiusBuilder + K-Quant INT4 config with CUDA EP target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread google-gemma-4-12B-it/inference.py Outdated
Comment thread google-gemma-4-12B-it/README.md Outdated
Comment thread google-gemma-4-12B-it/README.md Outdated
Comment thread google-gemma-4-12B-it/eval.py
Comment thread google-gemma-4-12B-it/eval.py
- Resolve model dirs relative to __file__ in eval.py/inference.py so the
  scripts work from any working directory (not just the recipe folder).
- Align eval.py usage docstring with the default task key
  (leaderboard_mmlu_pro).
- README: use block_size=32 (matching the JSON configs) and drop the
  redundant explicit pip install in favor of requirements.txt.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
from pathlib import Path

# Register Olive's ORT GenAI evaluator with lm-eval
import olive.evaluator.lmeval_ort # noqa: F401
For the encoder-free gemma4_unified architecture, each of the vision and
audio 'encoders' is a single projector MatMul that forms the entire
image/audio embedding pathway. Quantizing it to INT4 injects
disproportionate error (measured rel-L2 ~3.7% vision / ~9.2% audio) while
the components are tiny (~76 MB / ~1.4 MB), so keeping them FP16 costs
almost nothing.

Exclude them via nodes_to_exclude: ['*/projector/*']. The decoder
(including lm_head) stays INT4, where the size savings live and INT4 has
negligible impact on output tokens (top-1 logit agreement ~100%, KL~0.004).

The glob form of nodes_to_exclude requires microsoft/Olive#2518; with older
Olive the pattern matches nothing and projectors are quantized as before.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
@justinchuby justinchuby enabled auto-merge (squash) June 24, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants