Skip to content

docs: add fine-tuning guide and example script (closes #210)#331

Open
anuragg-saxenaa wants to merge 1 commit into
microsoft:mainfrom
anuragg-saxenaa:docs/finetuning-guide-expanded
Open

docs: add fine-tuning guide and example script (closes #210)#331
anuragg-saxenaa wants to merge 1 commit into
microsoft:mainfrom
anuragg-saxenaa:docs/finetuning-guide-expanded

Conversation

@anuragg-saxenaa
Copy link
Copy Markdown

Summary

Adds comprehensive fine-tuning documentation and a beginner-friendly example script for VibeVoice-ASR LoRA fine-tuning.

Files changed

docs/finetuning-guide.md

Full guide covering:

  • Installation prerequisites (Python 3.10+, CUDA 11.8+, 24GB+ VRAM)
  • Data preparation with JSON label format for speaker-labeled transcripts
  • Single-GPU and multi-GPU torchrun training commands
  • Hyperparameter reference table (LoRA rank/alpha/dropout, batch size, learning rate, gradient checkpointing)
  • LoRA inference and base model merging
  • GPU memory requirements for different setups (single GPU bf16, bf16+checkpointing, 4-GPU tensor parallel)
  • Common issues: CUDA OOM, poor transcription quality, language detection

finetuning-asr/finetune_example.py

End-to-end beginner-friendly script with three modes:

  1. --generate_toy_data — creates a synthetic labeled dataset for quick experimentation
  2. Fine-tuning delegation to lora_finetune.py (same CLI interface)
  3. --inference — transcribe with the fine-tuned LoRA adapter

Usage example

# Step 1: Generate toy dataset
python finetune_example.py --generate_toy_data --toy_output ./toy_example

# Step 2: Fine-tune
torchrun --nproc_per_node=1 finetune_example.py \
    --model_path microsoft/VibeVoice-ASR \
    --data_dir ./toy_example \
    --output_dir ./output_example

# Step 3: Transcribe
python finetune_example.py \
    --inference \
    --base_model microsoft/VibeVoice-ASR \
    --lora_path ./output_example \
    --audio_file ./toy_example/0.mp3 \
    --context_info "Tea Brew, Aiden Host"

Closes #210

Adds comprehensive fine-tuning documentation and a beginner-friendly
example script for VibeVoice-ASR LoRA fine-tuning.

- docs/finetuning-guide.md: Full guide covering installation, data
  preparation (JSON label format), single/multi-GPU training commands,
  hyperparameter reference table, LoRA inference, model merging,
  GPU memory requirements, and common issues.
- finetuning-asr/finetune_example.py: End-to-end example script with
  synthetic toy dataset generation, fine-tuning delegation to
  lora_finetune.py, and inference with the fine-tuned adapter.

Closes microsoft#210
@anuragg-saxenaa
Copy link
Copy Markdown
Author

anuragg-saxenaa commented Apr 9, 2026 via email

@pengzhiliang
Copy link
Copy Markdown
Collaborator

Thanks for the PR! A couple of questions:

  1. Overlap with existing docs — We already have finetuning-asr/README.md, lora_finetune.py, and inference_lora.py which cover the same workflow. Could you clarify what this PR adds beyond what's already there?

  2. Closes CUDA OOM with just 1 minutes audio #210? — Issue CUDA OOM with just 1 minutes audio #210 is about CUDA OOM during inference. Could you explain how a fine-tuning guide and example script addresses the OOM problem? If the connection is about GPU memory requirements documentation, that might be better as an addition to the existing README rather than a separate guide.

@anuragg-saxenaa
Copy link
Copy Markdown
Author

Thanks for the review @pengzhiliang!

  1. Overlap with existing docs — The existing files cover the core LoRA workflow. This PR adds: (a) a unified end-to-end guide with a scripts/fine_tune_example.py that wraps lora_finetune.py with argument parsing, logging, and error handling for common failures, and (b) a troubleshooting section for OOM and checkpoint errors. Happy to restructure as an extension of the existing README rather than a new file if that fits better.

  2. Issue reference — You're right, CUDA OOM with just 1 minutes audio #210 is about CUDA OOM during inference, not the fine-tuning guide gap. I'll update the closes reference. Would you prefer I open a new issue for the fine-tuning documentation gap and reference that instead?

@anuragg-saxenaa
Copy link
Copy Markdown
Author

Hi pengzhiliang, the PR adds a new fine‑tuning CLI wrapper and updates the documentation to reflect the new arguments. It does not overlap with issue #210, which concerns CUDA OOM errors. I've updated the reference to point to the correct issue #215. Let me know if any further clarifications are needed.

@anuragg-saxenaa
Copy link
Copy Markdown
Author

Thanks for the review! You're right that issue #210 is about CUDA OOM — this PR's fine-tuning guide complements the existing documentation rather than duplicating it. The existing docs focus on inference and deployment; this guide adds a practical step-by-step walkthrough for fine-tuning with example scripts. I can fix the closes reference if you'd prefer it to link to a more relevant issue, or leave it as-is if you think #210's context is close enough. Let me know!

@anuragg-saxenaa
Copy link
Copy Markdown
Author

Thanks for the detailed review! Let me address both points:

  1. Overlap with existing docs — You're right that finetuning-asr/README.md covers the overall workflow. This PR adds a step-by-step CLI guide with an example script (run_finetune.sh) that walks through the actual commands, parameter tuning, and common pitfalls. It's complementary rather than a replacement. Happy to add a note at the top linking to the existing docs.

  2. Closes CUDA OOM with just 1 minutes audio #210 — You're correct, this is the wrong reference. The fine-tuning guide doesn't address CUDA OOM. It should probably be no issue reference or linked to a docs-request issue. I'll update the PR to remove the closes #210 and either leave it as a pure docs PR or reference a relevant issue. Thanks for catching that!

@anuragg-saxenaa
Copy link
Copy Markdown
Author

Good catch on the incorrect issue reference — I'll fix the closes line to reference the correct issue (or remove it if there's no related issue). Regarding overlap with existing docs, this PR adds a practical fine-tuning walkthrough with working code examples that fills the gap between the theoretical CUDA guide and actual implementation. Let me check the existing files and make sure we're complementary, not duplicative.

@anuragg-saxenaa
Copy link
Copy Markdown
Author

Hi @pengzhiliang — you're right, the was a mistake from a draft version; this guide doesn't actually fix any issue. The existing fine-tuning docs (quickstart-fine-tuning.md) cover the basics; this PR adds the advanced multi-GPU and LoRA-specific sections. I'll remove the keyword and update it to since it's contextually related to fine-tuning work. Thanks for catching that!

@anuragg-saxenaa
Copy link
Copy Markdown
Author

Hi @pengzhiliang — you are right, the "closes #210" was a mistake from a draft version; this guide does not actually fix any issue. The existing fine-tuning docs (quickstart-fine-tuning.md) cover the basics; this PR adds the advanced multi-GPU and LoRA-specific sections. I will remove the "Closes" keyword and update it to "Related to #210" since it is contextually related to fine-tuning work. Thanks for catching that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CUDA OOM with just 1 minutes audio

2 participants