Add tencent/Hy-MT2-1.8B ONNX recipes (Mobius + K-Quant) by justinchuby · Pull Request #520 · microsoft/olive-recipes

justinchuby · 2026-06-23T22:07:39Z

Description

Adds Olive recipes that export tencent/Hy-MT2-1.8B — a fast multilingual (33-language) translation model — to ONNX via the MobiusBuilder pass, with optional INT4 K-Quant (Q4_K_M) quantization.

Recipes

Recipe	Pipeline
`cpu/fp32`	MobiusBuilder(fp32)
`cpu/int4`	MobiusBuilder(fp32) → OnnxKQuantQuantization(bits=4, block_size=32)
`cuda/fp16`	MobiusBuilder(fp16)
`cuda/bf16`	MobiusBuilder(bf16)
`cuda/int4`	MobiusBuilder(fp16) → OnnxKQuantQuantization(bits=4, block_size=32)

Validation

All 5 recipes build successfully with olive run.
CPU INT4 and CUDA INT4 packages produce correct translations via ORT GenAI (e.g. 黄河之水天上来 → "The water of the Yellow River comes from the sky").
The underlying mobius ONNX export is verified token-for-token against HuggingFace transformers (greedy prefill + generation goldens) in the mobius repo.

Add Olive recipes that export tencent/Hy-MT2-1.8B to ONNX via MobiusBuilder and quantize the decoder to INT4 with K-Quant (Q4_K_M). Recipes: cpu/fp32, cpu/int4, cuda/fp16, cuda/bf16, cuda/int4. Includes inference.py (ORT GenAI, HF tokenizer round-trip), info.yml, requirements.txt, README, and LICENSE. Validated: CPU INT4 and CUDA INT4 produce correct translations; the mobius ONNX export is token-for-token verified against HuggingFace transformers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>

Copilot

Pull request overview

Adds a new Olive recipe pack for exporting the HuggingFace translation model tencent/Hy-MT2-1.8B to an ORT GenAI ONNX package via MobiusBuilder, with optional INT4 K-Quant (Q4_K_M) quantization for CPU and CUDA deployments.

Changes:

Introduces recipe metadata (info.yml) and 5 Olive pipeline configs for CPU/CUDA and fp32/fp16/bf16/int4 variants.
Adds an ORT GenAI inference script (inference.py) and end-to-end usage documentation (README.md).
Adds per-recipe Python dependencies (requirements.txt) and a LICENSE file for the new recipe folder.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tencent-Hy-MT2-1.8B/requirements.txt	Declares Python dependencies for building/running the recipes.
tencent-Hy-MT2-1.8B/README.md	Documents the model, recipes, build steps, and inference usage.
tencent-Hy-MT2-1.8B/LICENSE	Adds a license file for the new recipe folder.
tencent-Hy-MT2-1.8B/info.yml	Registers recipe metadata (keywords, devices/EPs, recipe entries).
tencent-Hy-MT2-1.8B/inference.py	Provides an ORT GenAI-based translation runner for generated packages.
tencent-Hy-MT2-1.8B/cpu/fp32/config.json	CPU fp32 MobiusBuilder export pipeline.
tencent-Hy-MT2-1.8B/cpu/int4/config.json	CPU fp32 export + INT4 K-Quant pipeline.
tencent-Hy-MT2-1.8B/cuda/fp16/config.json	CUDA fp16 MobiusBuilder export pipeline.
tencent-Hy-MT2-1.8B/cuda/bf16/config.json	CUDA bf16 MobiusBuilder export pipeline.
tencent-Hy-MT2-1.8B/cuda/int4/config.json	CUDA fp16 export + INT4 K-Quant pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+Tokenizer note: the Hy-MT BPE vocab uses a custom regex pre-tokenizer that
+ort-extensions does not currently round-trip, so by default we tokenize and
+detokenize with the HuggingFace tokenizer and feed raw token IDs to
+``og.Generator`` (this still exercises the full ORT GenAI inference path).
+Pass ``--use-ort-tokenizer`` to force the ``og.Tokenizer`` path.


+    params = og.GeneratorParams(model)
+    params.set_search_options(max_length=len(input_tokens) + max_new_tokens, do_sample=False)
+    generator = og.Generator(model, params)
+    generator.append_tokens(input_tokens)


+    out = translate(model, hf_tokenizer, args.source, args.target_lang, args.max_new_tokens)
+    print(f"\nsource ({args.target_lang}): {args.source}")
+    print(f"translation: {out}")


+MIT License
+
+Copyright (c) 2025 Microsoft
+
+Permission is hereby granted, free of charge, to any person obtaining a copy


These produce default-EP ORT GenAI packages (empty provider_options) that run on CPU / DML / WebGPU, matching the default-EP variants published to the ONNX HuggingFace repo. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 23, 2026 22:07

Copilot started reviewing on behalf of justinchuby June 23, 2026 22:08 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tencent/Hy-MT2-1.8B ONNX recipes (Mobius + K-Quant)#520

Add tencent/Hy-MT2-1.8B ONNX recipes (Mobius + K-Quant)#520
justinchuby wants to merge 2 commits into
mainfrom
justinchu/hy-mt2-1.8b

justinchuby commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

justinchuby commented Jun 23, 2026

Description

Recipes

Validation

Contents

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants