AIDynamo: shared node disagg inference#909
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds shared-node disaggregation: track explicit num_nodes, detect overlapping prefill/decode node lists, compute and validate role node/GPU sizing, emit per-role node-list flags, and implement runtime GPU-slicing and per-worker system-port/metrics offsets. ChangesShared-Node Disaggregation Implementation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@doc/workloads/ai_dynamo.rst`:
- Around line 306-313: Update the earlier Slurm node-count guidance to
explicitly document the shared-node disaggregation exception: state that
normally required node count equals num_prefill_nodes + num_decode_nodes, but if
you set top-level num_nodes lower than prefill_worker.num-nodes +
decode_worker.num-nodes you enter Shared-Node Disaggregated mode where CloudAI
assigns decode GPUs first then prefill GPUs based on each role's
tensor-parallel-size * pipeline-parallel-size; reference the parameters
num_nodes, prefill_worker.num-nodes, decode_worker.num-nodes,
tensor-parallel-size and pipeline-parallel-size and mention CloudAI’s behavior
so users know this is an explicit exception rather than a contradiction.
In `@src/cloudai/workloads/ai_dynamo/ai_dynamo.py`:
- Line 607: constraint_check may be called with WorkerConfig.num_nodes still
holding a list which causes an unclear TypeError when computing
role_total_nodes; in the constraint_check (or just before computing
role_total_nodes) validate that prefill_worker.num_nodes and
decode_worker.num_nodes are scalar (not list/tuple), and if they are list-like
raise a clear TypeError explaining that apply_params_set/unroll_dse must have
been run (mention WorkerConfig.num_nodes, apply_params_set, unroll_dse) and
include which role (prefill or decode) has a list value so callers get an
actionable error instead of relying on int(list) behavior.
In `@src/cloudai/workloads/ai_dynamo/ai_dynamo.sh`:
- Around line 1024-1025: The SC2155 warning is about declaring and assigning in
one statement: in the for-loop where you call _gpu_list_for_worker_offset,
change the single-line "local gpu_list=$(...)" to two statements so the local
declaration and the assignment are separate; locate the loop using the for i in
$(seq ...) and the variable gpu_list and update to first "local gpu_list" then
"gpu_list=$(_gpu_list_for_worker_offset
\"${prefill_config[\"gpus-per-worker\"]}\" \"$i\" \"$gpu_offset\")".
- Around line 540-547: The helper function _gpu_list_for_worker_offset uses an
unnecessary echo and an unquoted variable; replace the subshell echo with a
direct cut reading the CUDA_VISIBLE_DEVICES content and quote expansions to
avoid word-splitting. Update _gpu_list_for_worker_offset to compute start/end as
before but call cut like: cut -d',' -f${start}-${end} <<<"$CUDA_VISIBLE_DEVICES"
(or printf '%s' "$CUDA_VISIBLE_DEVICES" | cut ...), and ensure positional inputs
(per_worker, idx, offset) are referenced as quoted variables where used to
satisfy shellcheck SC2005/SC2086.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 74871532-aacc-4d53-af88-c6e478a67f6f
📒 Files selected for processing (8)
conf/experimental/ai_dynamo/test_scenario/vllm_slurm.tomldoc/workloads/ai_dynamo.rstsrc/cloudai/_core/test_scenario.pysrc/cloudai/test_scenario_parser.pysrc/cloudai/workloads/ai_dynamo/ai_dynamo.pysrc/cloudai/workloads/ai_dynamo/ai_dynamo.shsrc/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.pytests/workloads/ai_dynamo/test_command_gen_strategy_slurm.py
Summary
num_nodesis smaller thanprefill_worker.num-nodes + decode_worker.num-nodes, prefill and decode role node lists overlap and share the allocated nodes.num_nodes; CloudAI still expands the allocation to the role-node sum for backward compatibility.TP * PPGPUs, prefill gets the nextTP * PPGPUs, with early validation when the combined role GPU count does not fit.Config Examples
Separate-node disagg, existing default behavior:
Shared-node disagg on two physical nodes:
With 8 GPUs per node, this runs decode on GPUs
0-3and prefill on GPUs4-7on each allocated node.Explicit node allocation, no real hostnames:
Overlapping worker
nodesmeans shared-node mode. Non-overlapping workernodesremain separate-node mode.Test Plan