Add tencent/Hy-MT2-1.8B ONNX recipes (Mobius + K-Quant)#520
Open
justinchuby wants to merge 2 commits into
Open
Add tencent/Hy-MT2-1.8B ONNX recipes (Mobius + K-Quant)#520justinchuby wants to merge 2 commits into
justinchuby wants to merge 2 commits into
Conversation
Add Olive recipes that export tencent/Hy-MT2-1.8B to ONNX via MobiusBuilder and quantize the decoder to INT4 with K-Quant (Q4_K_M). Recipes: cpu/fp32, cpu/int4, cuda/fp16, cuda/bf16, cuda/int4. Includes inference.py (ORT GenAI, HF tokenizer round-trip), info.yml, requirements.txt, README, and LICENSE. Validated: CPU INT4 and CUDA INT4 produce correct translations; the mobius ONNX export is token-for-token verified against HuggingFace transformers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new Olive recipe pack for exporting the HuggingFace translation model tencent/Hy-MT2-1.8B to an ORT GenAI ONNX package via MobiusBuilder, with optional INT4 K-Quant (Q4_K_M) quantization for CPU and CUDA deployments.
Changes:
- Introduces recipe metadata (
info.yml) and 5 Olive pipeline configs for CPU/CUDA and fp32/fp16/bf16/int4 variants. - Adds an ORT GenAI inference script (
inference.py) and end-to-end usage documentation (README.md). - Adds per-recipe Python dependencies (
requirements.txt) and a LICENSE file for the new recipe folder.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tencent-Hy-MT2-1.8B/requirements.txt | Declares Python dependencies for building/running the recipes. |
| tencent-Hy-MT2-1.8B/README.md | Documents the model, recipes, build steps, and inference usage. |
| tencent-Hy-MT2-1.8B/LICENSE | Adds a license file for the new recipe folder. |
| tencent-Hy-MT2-1.8B/info.yml | Registers recipe metadata (keywords, devices/EPs, recipe entries). |
| tencent-Hy-MT2-1.8B/inference.py | Provides an ORT GenAI-based translation runner for generated packages. |
| tencent-Hy-MT2-1.8B/cpu/fp32/config.json | CPU fp32 MobiusBuilder export pipeline. |
| tencent-Hy-MT2-1.8B/cpu/int4/config.json | CPU fp32 export + INT4 K-Quant pipeline. |
| tencent-Hy-MT2-1.8B/cuda/fp16/config.json | CUDA fp16 MobiusBuilder export pipeline. |
| tencent-Hy-MT2-1.8B/cuda/bf16/config.json | CUDA bf16 MobiusBuilder export pipeline. |
| tencent-Hy-MT2-1.8B/cuda/int4/config.json | CUDA fp16 export + INT4 K-Quant pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+8
to
+12
| Tokenizer note: the Hy-MT BPE vocab uses a custom regex pre-tokenizer that | ||
| ort-extensions does not currently round-trip, so by default we tokenize and | ||
| detokenize with the HuggingFace tokenizer and feed raw token IDs to | ||
| ``og.Generator`` (this still exercises the full ORT GenAI inference path). | ||
| Pass ``--use-ort-tokenizer`` to force the ``og.Tokenizer`` path. |
Comment on lines
+69
to
+72
| params = og.GeneratorParams(model) | ||
| params.set_search_options(max_length=len(input_tokens) + max_new_tokens, do_sample=False) | ||
| generator = og.Generator(model, params) | ||
| generator.append_tokens(input_tokens) |
Comment on lines
+131
to
+133
| out = translate(model, hf_tokenizer, args.source, args.target_lang, args.max_new_tokens) | ||
| print(f"\nsource ({args.target_lang}): {args.source}") | ||
| print(f"translation: {out}") |
Comment on lines
+1
to
+5
| MIT License | ||
|
|
||
| Copyright (c) 2025 Microsoft | ||
|
|
||
| Permission is hereby granted, free of charge, to any person obtaining a copy |
These produce default-EP ORT GenAI packages (empty provider_options) that run on CPU / DML / WebGPU, matching the default-EP variants published to the ONNX HuggingFace repo. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds Olive recipes that export tencent/Hy-MT2-1.8B — a fast multilingual (33-language) translation model — to ONNX via the
MobiusBuilderpass, with optional INT4 K-Quant (Q4_K_M) quantization.Recipes
cpu/fp32cpu/int4cuda/fp16cuda/bf16cuda/int4Validation
olive run.黄河之水天上来→ "The water of the Yellow River comes from the sky").transformers(greedy prefill + generation goldens) in the mobius repo.Contents
info.yml, 5config.jsonrecipes,inference.py(ORT GenAI; uses the HF tokenizer to round-trip the custom Hy-MT BPE vocab),requirements.txt,README.md,LICENSE.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com