Reduce dynamic-shape compile cost and select_module dispatch overhead by chun-wan · Pull Request #4911 · ROCm/AMDMIGraphX

chun-wan · 2026-05-26T10:13:37Z

For models with a single non-fixed dynamic dimension, today's split_single_dyn_dim pass enumerates every integer in [min, max] and stamps out one static-shape submodule per integer, regardless of dynamic_dimension::optimals. Combined with a per-inference linear scan inside select_module::compute(), this makes wide [min, max] ranges (e.g. {1, 5000000, opt:[400000]}) unusable: ~16 days of compile, ~40 GB engine and +80 - 100 us of per-inference host overhead vs a hand-static engine on MI308X.

This change introduces three layered, opt-in / auto-guarded optimizations:

MIGRAPHX_DYN_DIM_BUCKET_BY_OPTIMALS=1 makes split_single_dyn_dim honour dynamic_dimension::optimals (one submodule per optimal plus min/max endpoints) and adds a smallest-compatible-bucket fallback in select_module::compute() (zero-pads on the ref backend). Collapses compile time and .mxr size to O(|optimals|).
MIGRAPHX_DYN_DIM_FREEZE_TO= adds a new freeze_dyn_dim pass that runs before split_single_dyn_dim and rewrites every non-fixed dynamic parameter to a static shape at size N. Downstream sees a fully-static program -- no submodules, no select_module, no per-inference dispatch. Matches hand-written-static inference latency.
select_module::compute() gets a per-instance dispatch cache. Hot loops with stable input shape skip the linear scan and the per-call get_input_parameter_names / get_parameter_shapes allocations. Auto-activates when select_module has 4+ submodules; kill-switch MIGRAPHX_DISABLE_SELECT_MODULE_CACHE=1 for A/B benchmarks and emergency rollback.

Validated on MI308X (gfx942, ROCm 7.2.0):

Pointwise compile: 40.5 s (legacy max=256) -> 0.48 s (bucket max=5M).
Pointwise inference: 215 us legacy -> 121 us bucket + cache ->
115 us freeze (= true-static control at 115 us).
All existing unit + ref tests pass, plus 16 new test cases: 4 split_single_dyn_dim, 4 freeze_dyn_dim, 2 ref/freeze_dyn_dim, 6 ref/select_module (incl. cache hot-loop and alternating-shape regression). Validation ran with cache enabled and with the kill-switch on; bitwise-identical outputs verified across frozen / bucket / static engines.

Addresses #1023 (Dynamic Shape Support umbrella) and #4618 (ONNX-Runtime MIGraphX EP recompiling per shape).

Motivation

Technical Details

Changelog Category

Add a CHANGELOG.md entry for any option other than Not Applicable

- Added: New functionality.
- Changed: Changes to existing functionality.
- Removed: Functionality or support that has been removed. (Compared to a previous release)
- Optimized: Component performance that has been optimized or improved.
- Resolved Issues: Known issues from a previous version that have been resolved.
- Not Applicable: This PR is not to be included in the changelog.

codecov · 2026-05-26T16:44:48Z

Codecov Report

❌ Patch coverage is 96.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/include/migraphx/op/select_module.hpp	95.38%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4911      +/-   ##
===========================================
+ Coverage    92.66%   92.68%   +0.02%     
===========================================
  Files          588      588              
  Lines        30412    30472      +60     
===========================================
+ Hits         28180    28242      +62     
+ Misses        2232     2230       -2

Files with missing lines	Coverage Δ
src/include/migraphx/split_single_dyn_dim.hpp	`100.00% <ø> (ø)`
src/split_single_dyn_dim.cpp	`98.65% <100.00%> (+0.19%)`	⬆️
src/targets/ref/target.cpp	`100.00% <ø> (ø)`
src/include/migraphx/op/select_module.hpp	`95.15% <95.38%> (+8.61%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pfultz2 · 2026-05-26T21:49:43Z

MIGRAPHX_DYN_DIM_FREEZE_TO= adds a new freeze_dyn_dim pass that runs before split_single_dyn_dim and rewrites every non-fixed dynamic parameter to a static shape at size N. Downstream sees a fully-static program -- no submodules, no select_module, no per-inference dispatch. Matches hand-written-static inference latency.

Is this padding the dimension to the frozen dim? What happens when the dim is larger than the static size?

pfultz2 · 2026-05-26T21:52:48Z

+        // on the hot path.  When set, the entire cache infrastructure
+        // is bypassed -- no read, no populate -- so the measured cost
+        // matches the pre-cache implementation.
+        // cppcheck-suppress migraphx-UseCachedEnvVar


Dont suppress this cppcheck warning.

pfultz2 · 2026-05-26T21:54:00Z

+    // string form so tests (and runtime) can flip the value between
+    // compile passes within a single process without hitting a static
+    // cache.  N=0 disables the pass entirely (the default).
+    const std::size_t n = value_of(MIGRAPHX_DYN_DIM_FREEZE_TO::value());


Dont use ::value.

pfultz2 · 2026-05-26T21:56:24Z

+        auto new_param = mm->add_parameter(name, static_shape);
+        mm->replace_instruction(stand_param, new_param);
+        mm->remove_instruction(stand_param);
+        any_replaced = true;


This can already be achieved by setting the dimensions to a static size when parsing onnx. This pass should be removed.

Agreed, that method is also more customizable such that multiple different static dimensions can be set for different dynamic dimensions. Also remove the env variable.

pfultz2 · 2026-05-26T22:00:24Z

+// Cross-platform env helpers (Windows has _putenv_s instead of setenv).
+namespace {
+inline int set_env(const char* name, const char* value)
+{
+#ifdef _WIN32
+    return _putenv_s(name, value);
+#else
+    return ::setenv(name, value, /*overwrite=*/1);
+#endif
+}
+inline int unset_env(const char* name)
+{
+#ifdef _WIN32
+    return _putenv_s(name, "");
+#else
+    return ::unsetenv(name);
+#endif
+}
+} // namespace
+
+namespace {
+// RAII guard that turns MIGRAPHX_DYN_DIM_BUCKET_BY_OPTIMALS on for the
+// lifetime of the guard and restores the prior value on destruction.  Lets a
+// single test binary mix bucket-mode and default-mode test cases without
+// polluting the environment across tests.
+struct bucket_env_guard
+{
+    static constexpr const char* name = "MIGRAPHX_DYN_DIM_BUCKET_BY_OPTIMALS";
+    std::string prev;
+    bool had_prev = false;
+    explicit bucket_env_guard(const char* value)
+    {
+        if(auto* p = std::getenv(name))
+        {
+            prev     = p;
+            had_prev = true;
+        }
+        set_env(name, value);
+    }
+    ~bucket_env_guard()
+    {
+        if(had_prev)
+            set_env(name, prev.c_str());
+        else
+            unset_env(name);
+    }
+};
+} // namespace


Delete all this. Dont unit tests with env variables(or duplicate functionality we already have). These should be parameters to the pass and then passed in when running the tests: run_pass({.dyn_dim_bucket_by_optimals=true})

Suggested change

// Cross-platform env helpers (Windows has _putenv_s instead of setenv).

namespace {

inline int set_env(const char* name, const char* value)

{

#ifdef _WIN32

return _putenv_s(name, value);

#else

return ::setenv(name, value, /*overwrite=*/1);

#endif

}

inline int unset_env(const char* name)

{

#ifdef _WIN32

return _putenv_s(name, "");

#else

return ::unsetenv(name);

#endif

}

} // namespace

namespace {

// RAII guard that turns MIGRAPHX_DYN_DIM_BUCKET_BY_OPTIMALS on for the

// lifetime of the guard and restores the prior value on destruction. Lets a

// single test binary mix bucket-mode and default-mode test cases without

// polluting the environment across tests.

struct bucket_env_guard

{

static constexpr const char* name = "MIGRAPHX_DYN_DIM_BUCKET_BY_OPTIMALS";

std::string prev;

bool had_prev = false;

explicit bucket_env_guard(const char* value)

{

if(auto* p = std::getenv(name))

{

prev = p;

had_prev = true;

}

set_env(name, value);

}

~bucket_env_guard()

{

if(had_prev)

set_env(name, prev.c_str());

else

unset_env(name);

}

};

} // namespace

For models with a single non-fixed dynamic dimension, today's split_single_dyn_dim pass enumerates every integer in [min, max] and stamps out one static-shape submodule per integer, regardless of dynamic_dimension::optimals. Combined with a per-inference linear scan inside select_module::compute(), this makes wide [min, max] ranges (e.g. {1, 5000000, opt:[400000]}) unusable: ~16 days of compile, ~40 GB engine and +80 - 100 us of per-inference host overhead vs a hand-static engine on MI308X. This change introduces three layered, opt-in / auto-guarded optimizations: 1. MIGRAPHX_DYN_DIM_BUCKET_BY_OPTIMALS=1 makes split_single_dyn_dim honour dynamic_dimension::optimals (one submodule per optimal plus min/max endpoints) and adds a smallest-compatible-bucket fallback in select_module::compute() (zero-pads on the ref backend). Collapses compile time and .mxr size to O(|optimals|). 2. MIGRAPHX_DYN_DIM_FREEZE_TO=<N> adds a new freeze_dyn_dim pass that runs before split_single_dyn_dim and rewrites every non-fixed dynamic parameter to a static shape at size N. Downstream sees a fully-static program -- no submodules, no select_module, no per-inference dispatch. Matches hand-written-static inference latency. 3. select_module::compute() gets a per-instance dispatch cache. Hot loops with stable input shape skip the linear scan and the per-call get_input_parameter_names / get_parameter_shapes allocations. Auto-activates when select_module has 4+ submodules; kill-switch MIGRAPHX_DISABLE_SELECT_MODULE_CACHE=1 for A/B benchmarks and emergency rollback. Validated on MI308X (gfx942, ROCm 7.2.0): - Pointwise compile: 40.5 s (legacy max=256) -> 0.48 s (bucket max=5M). - Pointwise inference: 215 us legacy -> 121 us bucket + cache -> 115 us freeze (= true-static control at 115 us). - All existing unit + ref tests pass, plus 16 new test cases: 4 split_single_dyn_dim, 4 freeze_dyn_dim, 2 ref/freeze_dyn_dim, 6 ref/select_module (incl. cache hot-loop and alternating-shape regression). Validation ran with cache enabled and with the kill-switch on; bitwise-identical outputs verified across frozen / bucket / static engines. Addresses ROCm#1023 (Dynamic Shape Support umbrella) and ROCm#4618 (ONNX-Runtime MIGraphX EP recompiling per shape).

CharlieL7

Please make the docstrings more terse and/or use some type of bullet point format. There are way too many comment paragraphs to read.

CharlieL7 · 2026-05-27T15:03:52Z

-        };
-    }
+    auto& ctx = any_cast<context>(gctx);
+    ctx.set_exhaustive_tune_flag(options.exhaustive_tune);


You will need to update this file with respect to the GPU pipeline changes rather than just overwriting back to the previous.

…e kill switch Acts on pfultz2 + CharlieL7 review on ROCm#4911: - Remove `freeze_dyn_dim` pass + tests + docs. The same result is reachable via `parse_onnx(..., default_dyn_dim_value={N,N})` / `dim_params`, which is also more flexible (different static values per dynamic dim). Drops one env var (`MIGRAPHX_DYN_DIM_FREEZE_TO`), one source file, one header, two test files, and the corresponding CHANGELOG / env-vars.rst entries. - Lift `bucket_by_optimals` from an env-var read inside the pass to a field on the `split_single_dyn_dim` struct with `reflect()` support. The unit tests now construct the pass directly with the field (`run_passes(p, {split_single_dyn_dim{.bucket_by_optimals = true}, ...})`) instead of toggling env vars; no env-var state is observed by the pass body. The GPU/Ref targets still expose user opt-in via `MIGRAPHX_DYN_DIM_BUCKET_BY_OPTIMALS`, but it is read once in `target::get_passes()` (cached form) and forwarded as the pass field -- the test suite is now env-var free. - Make `select_module::compute()` bucket fallback always-on. Whether buckets exist in the submodule list is a compile-time decision (controlled by the pass field above); at runtime, if no exact-shape match is found and a smallest-compatible bucket exists, dispatch there. No env-var gate, no cppcheck-suppress. - Drop `MIGRAPHX_DISABLE_SELECT_MODULE_CACHE` + the `compute_legacy` branch + the kill-switch coverage tests. The dispatch cache is unconditionally active and identical to the uncached path on a miss; removing the kill switch lets us drop the cppcheck `UseCachedEnvVar` suppression that pfultz2 flagged. - Update CHANGELOG.md and docs/reference/MIGraphX-dev-env-vars.rst to match the new surface area. Locally verified inside the rocm/ali-private ROCm 7.2.0 build container: - `make -j$(nproc) migraphx` succeeds. - `test_split_single_dyn_dim_test` -- 11/11 pass (4 bucket-mode tests use the new pass field). - `test_ref select_module_*` -- 9/9 pass (the 3 kill-switch tests and the env-off-strict test are gone; bucket round-up + cache hot-loop + alternating-shapes tests still pass).

@CharlieL7

Acts on @CharlieL7's two new review points: - "Update gpu/target.cpp with respect to the GPU pipeline changes rather than just overwriting back to the previous": my previous commit had inadvertently dropped the `pipeline_factory` refactor that landed on develop. Reset gpu/target.cpp to upstream/develop and re-apply just my 5-line surgical change: * declare `MIGRAPHX_DYN_DIM_BUCKET_BY_OPTIMALS` alongside the other GPU target env vars * inside `pipeline_factory::dynamic_shapes_pipeline()`, pass `split_single_dyn_dim{.bucket_by_optimals = enabled(...)}` instead of the default-constructed pass. Net diff vs upstream/develop is now just those 5 lines + 1 line. - "Please make the docstrings more terse and/or use some type of bullet point format": condensed comments across all changed files -- multi-paragraph rationales are now 1-2 lines or bullet lists. No comment-only edits change behaviour; tests still pass. Verified in the rocm/ali-private ROCm 7.2.0 container: * `make migraphx` -> ok * `test_split_single_dyn_dim_test` -> 11/11 pass * `test_ref "select_module_*"` -> 9/9 pass

…target.cpp * split_single_dyn_dim.hpp: copyright was still 2024 -> bump to 2026 to satisfy `tools/check_stamped.py` (licensing CI). * gpu/target.cpp: apply the 2-line clang-format diff that the CI `Check formatting` step produced (line-break placement around `split_single_dyn_dim{.bucket_by_optimals = enabled(...)}`).

chun-wan requested review from a team and causten as code owners May 26, 2026 10:13

CharlieL7 requested review from CharlieL7 and shivadbhavsar May 26, 2026 12:26

chun-wan force-pushed the feat/dyn-dim-freeze-bucket-cache branch 13 times, most recently from 783fe62 to b8cb3ec Compare May 26, 2026 16:32

chun-wan force-pushed the feat/dyn-dim-freeze-bucket-cache branch 4 times, most recently from 23ab503 to 57bad58 Compare May 26, 2026 19:40

pfultz2 reviewed May 26, 2026

View reviewed changes

chun-wan force-pushed the feat/dyn-dim-freeze-bucket-cache branch from 57bad58 to 160ff74 Compare May 27, 2026 02:00

CharlieL7 requested changes May 27, 2026

View reviewed changes

chun-wan added 3 commits May 28, 2026 02:06

chun-wan force-pushed the feat/dyn-dim-freeze-bucket-cache branch from e1d4dfb to 18ddc81 Compare May 27, 2026 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce dynamic-shape compile cost and select_module dispatch overhead#4911

Reduce dynamic-shape compile cost and select_module dispatch overhead#4911
chun-wan wants to merge 4 commits into
ROCm:developfrom
chun-wan:feat/dyn-dim-freeze-bucket-cache

chun-wan commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

pfultz2 commented May 26, 2026

Uh oh!

pfultz2 May 26, 2026

Uh oh!

pfultz2 May 26, 2026

Uh oh!

pfultz2 May 26, 2026

Uh oh!

CharlieL7 May 27, 2026

Uh oh!

pfultz2 May 26, 2026

Uh oh!

CharlieL7 left a comment

Uh oh!

CharlieL7 May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chun-wan commented May 26, 2026

Motivation

Technical Details

Changelog Category

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pfultz2 commented May 26, 2026

Uh oh!

pfultz2 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

pfultz2 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

pfultz2 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

CharlieL7 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

pfultz2 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

CharlieL7 left a comment

Choose a reason for hiding this comment

Uh oh!

CharlieL7 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 26, 2026 •

edited

Loading