Infer audio encoding format from source instead of defaulting to MP3 by jaredoconnell · Pull Request #794 · vllm-project/guidellm

jaredoconnell · 2026-06-12T22:31:43Z

Summary

Uses the source's audio format rather than defaulting to MP3

Details

If no format is provided by the user, and it cannot be inferred, it defaults to WAV and warns the user.
Simply uses the format read from the dataset if none is provided.
Includes a fix for a missing super call in the response handler.

Test Plan

Make sure you install vLLM with the audio features.

You can run this with multiple datasets. Some options include:

guidellm benchmark run   --target "http://localhost:8000"   --request-type /v1/audio/transcriptions   --profile kind=synchronous   --max-requests 10   --data '{"kind": "huggingface", "source": "google/fleurs", "load_kwargs": {"name": "en_us", "split": "test"}}'   --data-column-mapper '{"column_mappings": {"audio_column": "audio"}}'   --disable-progress

guidellm benchmark run   --target "http://localhost:8000"   --request-type /v1/audio/transcriptions   --profile kind=synchronous   --max-requests 10   --data '{"kind": "huggingface", "source": "openslr/librispeech_asr", "load_kwargs": {"name": "clean", "split": "test"}}'   --data-column-mapper '{"column_mappings": {"audio_column": "audio"}}'   --disable-progress

To explicitly use MP3 like it used to, do:

guidellm benchmark run \
  --target "http://localhost:8000" \
  --request-type /v1/audio/transcriptions \
  --profile kind=synchronous \
  --max-requests 10 \
  --data '{"kind": "huggingface", "source": "openslr/librispeech_asr", "load_kwargs": {"name": "clean", "split": "test"}}' \
  --data-column-mapper '{"column_mappings": {"audio_column": "audio"}}' \
  --data-preprocessors '{"kind": "encode_media", "audio_kwargs": {"audio_format": "mp3"}}' \
  --disable-progress

Related Issues

Resolves Avoid default MP3 transcoding for transcription benchmarks #623

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes code generated or substantially modified by an AI agent
Includes tests generated or substantially modified by an AI agent

NOTE: the Generated-by or Assisted-by trailers should be used in git commit messages when code or tests were generated or substantially modified by an AI agent, as described in the project's DEVELOPING.md file.

git log

commit 6cc6e9f
Author: Jared O'Connell joconnel@redhat.com
Date: Thu Jun 11 15:00:08 2026 -0400

Infer audio encoding format from source instead of defaulting to MP3

Generated-by: Cursor AI Claude Opus 4.6
Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Generated-by: Cursor AI Claude Opus 4.6
Signed-off-by: Jared O'Connell joconnel@redhat.com

Generated-by: Cursor AI Claude Opus 4.6 Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Infer audio encoding format from source instead of defaulting to MP3

6cc6e9f

Generated-by: Cursor AI Claude Opus 4.6 Signed-off-by: Jared O'Connell <joconnel@redhat.com>

jaredoconnell force-pushed the fix/audio-formats branch from 0f43c0e to 6cc6e9f Compare June 12, 2026 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer audio encoding format from source instead of defaulting to MP3#794

Infer audio encoding format from source instead of defaulting to MP3#794
jaredoconnell wants to merge 1 commit into
vllm-project:mainfrom
jaredoconnell:fix/audio-formats

jaredoconnell commented Jun 12, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jaredoconnell commented Jun 12, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Related Issues

Use of AI

git log

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jaredoconnell commented Jun 12, 2026 •

edited by github-actions Bot

Loading