vLLM/SGLang: comparison report#904
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds canonicalized diff helpers and internal ComparisonReport grouping; introduces an abstract LLMServingComparisonReport for standardized metric extraction and visualization; implements vLLM and SGLang comparison reports, registers them, and adds unit/integration tests that validate generated HTML outputs. ChangesLLM Serving Comparison Reports
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
bd9012d to
2596726
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/report_generator/util.py`:
- Around line 175-183: diff_comparison_values currently uses repr(v) to detect
differences which produces false positives (e.g., 1 vs 1.0 or dicts with
different insertion order); replace that with a canonicalization + equality
check: add a small helper (inside diff_comparison_values) named
canonicalize(value) that (a) for Mapping or list/tuple returns json.dumps(value,
sort_keys=True, default=str) to normalize structural content/order, (b) for
numeric types normalizes to a single numeric form (e.g., float(value) or
Decimal) so 1 and 1.0 compare equal, and (c) otherwise returns the value itself
(or str(value) as a last resort); then compute canonical_forms =
[canonicalize(v) for v in all_values] and use len(set(canonical_forms)) > 1 to
decide diffs and assign diff[key] = all_values as before (refer to
diff_comparison_values, values_by_run, all_values).
In `@src/cloudai/workloads/common/llm_serving_report.py`:
- Line 176: The calls to extract_data_as_df are duplicated in create_tables and
create_charts causing repeated parsing; modify the class to compute and cache
extracted DataFrames once and reuse them: e.g., add a lightweight cache
attribute (like self._df_cache) keyed by the test run object or an immutable id
(referencing extract_data_as_df, create_tables, create_charts, and
group.items/item.tr), populate it the first time you call extract_data_as_df for
a given tr, and have both create_tables and create_charts read from that cache
(or accept an injected precomputed list of DataFrames) instead of calling
extract_data_as_df again.
In `@src/cloudai/workloads/vllm/vllm_comparison_report.py`:
- Around line 24-25: The current import in vllm_comparison_report.py pulls
VllmBenchCmdArgs, VLLMBenchReportGenerationStrategy, and VllmTestDefinition from
the package root (cloudai.workloads.vllm) which creates a circular init
dependency; change those to direct imports from the submodules that actually
define them (use relative/submodule imports) so VllmBenchCmdArgs,
VLLMBenchReportGenerationStrategy, and VllmTestDefinition are imported from
their defining modules instead of from cloudai.workloads.vllm.
In `@tests/workloads/common/test_llm_serving_report.py`:
- Around line 40-137: Add a second test mirroring
test_llm_comparison_report_generates_html that exercises the SGLang path
end-to-end: create two cloudai.core.TestRun instances for SGLang (use
SGLangTestDefinition, SGLangCmdArgs, SGLangBenchCmdArgs,
SGLangSemanticEvalCmdArgs), write corresponding bench and quality JSON results
to slurm_system.output_path under each test run (use the SGLANG_BENCH_JSON_FILE
and SGLANG_GSM8K_JSON_FILE constants or their equivalents), instantiate
SGLangComparisonReport with a TestScenario containing those runs, call
load_test_runs(), create_tables(group_test_runs()) and assert key table cells
(latency/throughput/quality/success) similar to the vLLM test, call
report.generate(), and finally assert that (slurm_system.output_path /
"sglang_comparison.html").exists() to ensure HTML was produced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 6a508bb1-229e-4702-b1ad-bdb49888fef9
📒 Files selected for processing (12)
src/cloudai/registration.pysrc/cloudai/report_generator/comparison_report.pysrc/cloudai/report_generator/groups.pysrc/cloudai/report_generator/util.pysrc/cloudai/workloads/common/llm_serving_report.pysrc/cloudai/workloads/sglang/__init__.pysrc/cloudai/workloads/sglang/sglang_comparison_report.pysrc/cloudai/workloads/vllm/__init__.pysrc/cloudai/workloads/vllm/vllm_comparison_report.pytests/test_init.pytests/workloads/common/test_llm_serving.pytests/workloads/common/test_llm_serving_report.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/report_generator/util.py`:
- Around line 191-197: _canonical_comparison_value currently only coerces
top-level ints/floats to float and then uses json.dumps on mappings/lists which
leaves nested numeric types like 1 vs 1.0 unchanged; update
_canonical_comparison_value to recursively walk collections (for the mapping
case handle keys and values, for list/tuple handle each element) and replace any
int/float with float (preserve bools), then json.dumps the normalized structure
so nested numerics are canonicalized before serialization. Target the existing
function name _canonical_comparison_value and ensure you handle collections.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 711f86bc-39c4-43a8-bb10-01f52a72de91
📒 Files selected for processing (6)
src/cloudai/report_generator/util.pysrc/cloudai/workloads/common/llm_serving_report.pysrc/cloudai/workloads/sglang/sglang_comparison_report.pysrc/cloudai/workloads/vllm/vllm_comparison_report.pytests/report_generation_strategy/test_report_groups.pytests/workloads/common/test_llm_serving_report.py
Summary
Test Plan
Additional Notes