feat: add multi-image latency benchmarking#729
Open
sjarvie wants to merge 1 commit into
Open
Conversation
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
Implement multi-image benchmarking for vision-language models to measure latency impact of multiple frames per request. Changes: - MultiImageDatasetConfig schema for datasets with N images per request - 720p image generator with base64 encoding and reproducible seeding - CLI parameter: --images-per-request (single or comma-separated list) - MultiImageBenchmark programmatic API for pytest integration - 14 unit tests covering config validation and image generation - Documentation with usage examples The feature enables benchmarking how TTFT and ITL scale with increasing frame counts, useful for video analysis pipelines.
0290221 to
2788605
Compare
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
Contributor
|
Hi @sjarvie, the DCO check has failed. Please click on DCO in the Checks section for instructions on how to resolve this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add benchmarking capability to measure latency impact of multiple images (frames) per request in GuideLLM. This enables testing how TTFT and ITL scale with multi-frame vision inputs.
MultiImageDatasetConfigextending synthetic data generation with images_per_request parameter--images-per-requestparameter supporting single or comma-separated values (e.g.,1,2,5)MultiImageBenchmarkprogrammatic class for pytest integrationTest Plan
Implementation Details
Files Created:
src/guidellm/data/schemas.py— MultiImageDatasetConfigsrc/guidellm/data/generators/multi_image.py— Image generator (720p)src/guidellm/data/generators/__init__.py— Module exportssrc/guidellm/benchmark/multi_image.py— Programmatic APItests/unit/data/test_multi_image_config.py— Config validation teststests/unit/data/generators/test_multi_image_generator.py— Generator testsFiles Modified:
src/guidellm/data/deserializers/synthetic.py— MultiImageDatasetConfig supportsrc/guidellm/cli/benchmark/run.py— CLI parametersrc/guidellm/benchmark/__init__.py— API exportsdocs/getting-started/benchmark.md— Usage guide with examplesUsage Examples
CLI (single image count):
CLI (multi-variant):
Programmatic: