Skip to content

Add tencent/Hy-MT2-1.8B ONNX recipes (Mobius + K-Quant)#520

Open
justinchuby wants to merge 2 commits into
mainfrom
justinchu/hy-mt2-1.8b
Open

Add tencent/Hy-MT2-1.8B ONNX recipes (Mobius + K-Quant)#520
justinchuby wants to merge 2 commits into
mainfrom
justinchu/hy-mt2-1.8b

Conversation

@justinchuby

Copy link
Copy Markdown
Contributor

Description

Adds Olive recipes that export tencent/Hy-MT2-1.8B — a fast multilingual (33-language) translation model — to ONNX via the MobiusBuilder pass, with optional INT4 K-Quant (Q4_K_M) quantization.

Recipes

Recipe Pipeline
cpu/fp32 MobiusBuilder(fp32)
cpu/int4 MobiusBuilder(fp32) → OnnxKQuantQuantization(bits=4, block_size=32)
cuda/fp16 MobiusBuilder(fp16)
cuda/bf16 MobiusBuilder(bf16)
cuda/int4 MobiusBuilder(fp16) → OnnxKQuantQuantization(bits=4, block_size=32)

Validation

  • All 5 recipes build successfully with olive run.
  • CPU INT4 and CUDA INT4 packages produce correct translations via ORT GenAI (e.g. 黄河之水天上来 → "The water of the Yellow River comes from the sky").
  • The underlying mobius ONNX export is verified token-for-token against HuggingFace transformers (greedy prefill + generation goldens) in the mobius repo.

Contents

info.yml, 5 config.json recipes, inference.py (ORT GenAI; uses the HF tokenizer to round-trip the custom Hy-MT BPE vocab), requirements.txt, README.md, LICENSE.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Add Olive recipes that export tencent/Hy-MT2-1.8B to ONNX via MobiusBuilder
and quantize the decoder to INT4 with K-Quant (Q4_K_M).

Recipes: cpu/fp32, cpu/int4, cuda/fp16, cuda/bf16, cuda/int4.
Includes inference.py (ORT GenAI, HF tokenizer round-trip), info.yml,
requirements.txt, README, and LICENSE.

Validated: CPU INT4 and CUDA INT4 produce correct translations; the mobius
ONNX export is token-for-token verified against HuggingFace transformers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 23, 2026 22:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Olive recipe pack for exporting the HuggingFace translation model tencent/Hy-MT2-1.8B to an ORT GenAI ONNX package via MobiusBuilder, with optional INT4 K-Quant (Q4_K_M) quantization for CPU and CUDA deployments.

Changes:

  • Introduces recipe metadata (info.yml) and 5 Olive pipeline configs for CPU/CUDA and fp32/fp16/bf16/int4 variants.
  • Adds an ORT GenAI inference script (inference.py) and end-to-end usage documentation (README.md).
  • Adds per-recipe Python dependencies (requirements.txt) and a LICENSE file for the new recipe folder.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tencent-Hy-MT2-1.8B/requirements.txt Declares Python dependencies for building/running the recipes.
tencent-Hy-MT2-1.8B/README.md Documents the model, recipes, build steps, and inference usage.
tencent-Hy-MT2-1.8B/LICENSE Adds a license file for the new recipe folder.
tencent-Hy-MT2-1.8B/info.yml Registers recipe metadata (keywords, devices/EPs, recipe entries).
tencent-Hy-MT2-1.8B/inference.py Provides an ORT GenAI-based translation runner for generated packages.
tencent-Hy-MT2-1.8B/cpu/fp32/config.json CPU fp32 MobiusBuilder export pipeline.
tencent-Hy-MT2-1.8B/cpu/int4/config.json CPU fp32 export + INT4 K-Quant pipeline.
tencent-Hy-MT2-1.8B/cuda/fp16/config.json CUDA fp16 MobiusBuilder export pipeline.
tencent-Hy-MT2-1.8B/cuda/bf16/config.json CUDA bf16 MobiusBuilder export pipeline.
tencent-Hy-MT2-1.8B/cuda/int4/config.json CUDA fp16 export + INT4 K-Quant pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +8 to +12
Tokenizer note: the Hy-MT BPE vocab uses a custom regex pre-tokenizer that
ort-extensions does not currently round-trip, so by default we tokenize and
detokenize with the HuggingFace tokenizer and feed raw token IDs to
``og.Generator`` (this still exercises the full ORT GenAI inference path).
Pass ``--use-ort-tokenizer`` to force the ``og.Tokenizer`` path.
Comment on lines +69 to +72
params = og.GeneratorParams(model)
params.set_search_options(max_length=len(input_tokens) + max_new_tokens, do_sample=False)
generator = og.Generator(model, params)
generator.append_tokens(input_tokens)
Comment on lines +131 to +133
out = translate(model, hf_tokenizer, args.source, args.target_lang, args.max_new_tokens)
print(f"\nsource ({args.target_lang}): {args.source}")
print(f"translation: {out}")
Comment on lines +1 to +5
MIT License

Copyright (c) 2025 Microsoft

Permission is hereby granted, free of charge, to any person obtaining a copy
These produce default-EP ORT GenAI packages (empty provider_options) that
run on CPU / DML / WebGPU, matching the default-EP variants published to
the ONNX HuggingFace repo.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants