Convert onnx_perf_test to standalone WinML mode#1
Conversation
Replace WindowsAppSDK framework dependencies with the flat-C WinMLEpCatalog API from Microsoft.Windows.AI.MachineLearning NuGet package. Changes: - Add winml_standalone.cc/h using WinMLEpCatalog API for EP discovery - Remove winappsdk_bootstrap.cc/h (WinAppSDK/CppWinRT bootstrap) - Replace 6 WinAppSDK NuGet packages with single Microsoft.Windows.AI.MachineLearning 2.0.297-preview - Remove CppWinRT, WIL, Foundation, onecoreuap.lib dependencies - Use BUILD_WINML_STANDALONE_PERF_TEST compile definition - Add post-build copy of onnxruntime.dll from NuGet package - Remove --winappsdk_version flag (no longer needed) - Move WinML init after Ort::Env creation with scope-guard cleanup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The regular onnxruntime_perf_test target globs all files in perftest/windows/ but should not compile winml_standalone.cc/h since those depend on the WinML NuGet package which is only linked by the standalone target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename cmake file: winappsdk_onnxruntime_perf_test.cmake -> winml_standalone_perf_test.cmake - Rename cmake option: onnxruntime_BUILD_WINAPPSDK_PERF_TEST -> onnxruntime_BUILD_WINML_STANDALONE_PERF_TEST - Rename target/EXE: winappsdk_onnxruntime_perf_test -> winml_standalone_perf_test - Rename flag: --winappsdk_register_provider -> --winml_register_provider - Rename struct member: winappsdk_register_provider -> winml_register_provider - Add null check for OrtGetApiBase() return before dereferencing - Update all log messages from [WinAppSDK] to [WinML Standalone] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- p-x64.cmd: BUILD_WINAPPSDK_PERF_TEST -> BUILD_WINML_STANDALONE_PERF_TEST, removed CPPWINRT_VERSION - p-arm.cmd: same option rename, removed CPPWINRT_VERSION - b.cmd: target winappsdk_onnxruntime_perf_test -> winml_standalone_perf_test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The winappsdk_bootstrap.cc/h files were deleted, so the glob filters excluding them are no-ops. Remove them to reduce clutter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chrisdMSFT
left a comment
There was a problem hiding this comment.
PR #1 Review — Convert onnx_perf_test to standalone WinML mode
Repo: chrisdMSFT/onnxruntime
PR: #1
Base: user/chrisd/rel-1.25.1-qol-ifdef ← Head: winml-standalone
Stats: 15 files, +401 / −516
Overall the change is well-scoped: it cleanly swaps the WinAppSDK/WinRT
bootstrap for the flat-C WinMLEpCatalog API, keeps the regular
onnxruntime_perf_test target unaffected (via glob exclusion), and
preserves ORT_API_MANUAL_INIT consistency. The renames
(winappsdk_* → winml_standalone_*, flag, struct member, target,
cmake file) are applied uniformly.
The issues below are ordered by severity. None are necessarily blockers
for a personal/dev branch, but several should be addressed before this
ships to a wider audience.
High severity
H1. ORT API-version fallback risks UB if any v25-only API is called
File: onnxruntime/test/perftest/main.cc (around the new
BUILD_WINML_STANDALONE_PERF_TEST block)
g_ort = api_base->GetApi(ORT_API_VERSION); // tries v25
if (g_ort == nullptr) {
for (uint32_t v = ORT_API_VERSION - 1; v >= 1; --v) {
g_ort = api_base->GetApi(v);
if (g_ort != nullptr) { ... break; }
}
}GetApi(v) from a v24 runtime returns an OrtApi* whose memory layout
is the v24 struct. The compile-time headers (ORT_API_VERSION == 25,
see include/onnxruntime/core/session/onnxruntime_c_api.h:41) describe
the v25 struct, which has new function pointers appended after the
v24 layout. Any call into a v25-only member (g_ort->SomeNewApi(...))
will dereference past the actual struct allocated by the older DLL —
undefined behavior, typically a crash or wild jump.
The PR works today only because the perf-test code paths happen to call
v1–v24 functions. A future contributor adding a v25 API call will
silently introduce a crash on the v24-shipping NuGet, with no compile-
time signal.
Recommended mitigations (any one):
- Refuse to fall back: hard-error out if
GetApi(ORT_API_VERSION)
returns nullptr, and instead bump the bundled NuGet (or pin
ORT_API_VERSIONto v24 for this target via a private header). - Build the standalone target against a v24-only header copy, so the
compiler enforces the contract. - At minimum, log loudly (e.g. red banner) and record the resolved
version so ag_ort_runtime_versioncan be checked before any
v25-specific call site.
H2. UTF-8 → UTF-16 path conversion is incorrect
File: onnxruntime/test/perftest/windows/winml_standalone.cc:158
std::string libPath(pathSize, '\0');
...
HRESULT pathHr = WinMLEpGetLibraryPath(ep, pathSize, libPath.data(), &used);
...
std::wstring wpath(libPath.begin(), libPath.end()); // <-- broken for non-ASCII
ctx->env->RegisterExecutionProviderLibrary(providerName.c_str(), wpath);The two-iterator std::wstring constructor zero-extends each char
into a wchar_t. This only round-trips ASCII. If the EP package family
or install path contains any non-ASCII byte (e.g. a localized
%LOCALAPPDATA% profile name, accented user folder, CJK), every
multi-byte UTF-8 sequence is split into multiple bogus UTF-16 code
units and LoadLibraryW fails (or, worse, loads the wrong file).
Use MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, ...):
int n = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
libPath.data(), (int)libPath.size(), nullptr, 0);
std::wstring wpath(n, L'\0');
MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
libPath.data(), (int)libPath.size(),
wpath.data(), n);(or query the path directly as wide if the WinML API offers a wide
variant — worth checking WinMLEpCatalog.h.)
H3. WinMLEpGetLibraryPath length handling is contract-fragile
File: onnxruntime/test/perftest/windows/winml_standalone.cc:137-153
size_t pathSize = 0;
HRESULT pathSizeHr = WinMLEpGetLibraryPathSize(ep, &pathSize);
...
std::string libPath(pathSize, '\0');
size_t used = 0;
HRESULT pathHr = WinMLEpGetLibraryPath(ep, pathSize, libPath.data(), &used);
...
libPath.resize(used > 0 ? used - 1 : 0); // trim null terminatorTwo undocumented assumptions:
usedalways includes the null terminator. If a future runtime
returns the strlen instead,used - 1chops the last character of
the path.pathSizealready includes the null. If it doesn't, the buffer is
one byte too small.
Both are pure conjecture without the WinMLEpCatalog.h contract in
hand. Please:
- Confirm the contract from the header/docs and add a comment quoting
it. - Add a defensive
if (used > pathSize) { error }and prefer
strnlen(libPath.data(), pathSize)over theused - 1trim.
Medium severity
M1. WinMLEpCatalogEnumProviders return value is ignored
File: winml_standalone.cc:87
WinMLEpCatalogEnumProviders(catalog, [](...) -> BOOL { ... }, &ctx);If enumeration itself fails (HRESULT or BOOL return), the function
silently completes with zero registered providers — and then session
creation later fails with a misleading "no EP available" error. Capture
and surface the result.
M2. Per-provider failures are silently ignored
The lambda returns TRUE for every error path (NotPresent,
EnsureReady failed, GetLibraryPathSize failed, RegisterExecutionProvider
threw). That is reasonable for "skip this EP and try the next", but the
aggregate outcome is never reported. If the user typed
--winml_register_provider=VitisAIExecutionProvider and that one EP
failed to register, the perf test still runs (against CPU only) and the
real failure is buried in stderr.
Suggest: track registered_count, and if provider_filter is non-empty
and any requested name was not successfully registered, throw / exit
non-zero from WinML_FindAndRegisterAllProviders.
M3. File-static globals make re-init unsafe / leak the catalog
File: winml_standalone.cc:21-22
static WinMLEpCatalogHandle g_ep_catalog = nullptr;
static std::vector<std::string> g_registered_providers;Calling WinML_InitializeAndRegisterAllProviders twice without a
WinML_Uninitialize in between leaks the previous catalog handle and
appends to g_registered_providers. Either:
- Document "must be called exactly once" and
assert(!g_ep_catalog)at
entry, or - Stash the handle/list inside a small struct returned to the caller (or
hung offOrt::Envvia user-data) so re-entry is well-defined.
M4. Loop invariant on uint32_t v underflows pathologically
File: main.cc
for (uint32_t v = ORT_API_VERSION - 1; v >= 1; --v) { ... }If ORT_API_VERSION is ever defined as 0 (currently it's 25, but
hypothetically), v initializes to 0xFFFFFFFF and the loop runs ~4B
times. Trivial fix:
for (uint32_t v = ORT_API_VERSION; v > 1; ) {
--v;
...
}…or use a signed counter. Not exploitable today; flagged for
robustness.
M5. OrtGetApiBase redefinition is a fragile linker contract
File: winml_standalone.cc:24-52
The standalone target re-implements extern "C" const OrtApiBase* __cdecl OrtGetApiBase() to dynamically LoadLibraryExW("onnxruntime.dll")
from the EXE directory. This relies on no other static lib in the link
line (onnx_test_runner_common, onnxruntime_test_utils,
onnxruntime_common, onnxruntime_flatbuffers,
onnx_test_data_proto) ever pulling in a definition of
OrtGetApiBase. Today it links; if anyone later adds a static lib that
brings the symbol along, you'll get a duplicate-symbol error with no
clear hint pointing here. Add a prominent comment and ideally an
/INCLUDE: link assertion (or a unit-test build step) to catch this.
Also: the magic-static lambda is fine for thread safety, but it does
not log the resolved DLL load path on success — only on failure. A
single line std::wcout << L"[WinML Standalone] Loaded: " << ortPath
on success (which is already there!) is great; consider also logging
the resolved OrtApiBase* and the GetVersionString() to make support
diagnosis trivial.
Low severity / nits
L1. Missing using namespace std::filesystem or fs:: alias
Stylistic only — std::filesystem::path is used twice and reads fine.
No change needed.
L2. ready_state_to_string is unused
File: winml_standalone.cc:54-63
The helper function is defined but never called. It's also not declared
static / in an unnamed namespace, so it pollutes the global symbol
table.
L3. Mixed I/O streams (std::cout / std::wcout / printf)
The new code mixes narrow std::cout, wide std::wcout, and the
existing fprintf(stdout, ...) calls. Without a std::ios::sync_with_stdio
guarantee, output ordering can be surprising under buffering. Pick one
flavor for the new file (probably narrow std::cout everywhere except
where wchar paths are needed) and stick to it. Existing precedent in
the codebase is fprintf, but this is a minor preference.
L4. EnumContext ctx shadows the lambda's local auto* ctx
File: winml_standalone.cc:85-88
EnumContext ctx{ &provider_filter, &env, &g_registered_providers };
WinMLEpCatalogEnumProviders(catalog, [](..., void* context) -> BOOL {
auto* ctx = static_cast<EnumContext*>(context); // shadows outer ctx
...
}, &ctx);Lambdas don't capture, so the inner ctx doesn't actually hide
anything reachable, but the name collision is mildly confusing. Rename
the inner local to c or ec.
L5. WinML_InitializeAndRegisterAllProviders is a one-line
forwarder to WinML_FindAndRegisterAllProviders. Either inline the
Find... body or drop the wrapper — having two near-identical names is
clutter.
L6. gsl::finally placement comment
File: main.cc
auto winml_cleanup_at_scope_exit = gsl::finally([&]() {
WinML_Uninitialize(env);
});Good pattern, and the "must happen before env destruction" comment is
correct. Worth adding "and before g_ort is reset to nullptr (if you
ever add that)" because UnregisterExecutionProviderLibrary will dive
through the C API.
L7. winappsdk_version removal — unused-flag deprecation note
The flag is gone entirely (good, since the WinAppSDK runtime is gone).
If any internal CI / dev scripts still pass --winappsdk_version=1.8,
absl will refuse with an unknown-flag error rather than ignoring it.
The PR does update chrisd/p-x64.cmd and chrisd/p-arm.cmd, but a
quick grep for winappsdk_version in any other internal pipeline is
worth doing before merge.
L8. onnxruntime.dll resolution is EXE-dir only
File: winml_standalone.cc:28-39
The shim hard-binds to <exe_dir>\onnxruntime.dll and ignores
PATH/SetDllDirectory. That matches the post-build copy in
cmake/winml_standalone_perf_test.cmake:110-112, so it's correct for
the in-tree build, but it means a developer can't drop a sideloaded
onnxruntime.dll into PATH and override. If that's intentional (which
seems likely, to lock the version against the WinML NuGet's runtime),
add a one-line comment saying so.
L9. BUILD_WINML_STANDALONE_PERF_TEST #error guard
File: winml_standalone.cc:17-19
#ifndef BUILD_WINML_STANDALONE_PERF_TEST
#error "This file should only be compiled when BUILD_WINML_STANDALONE_PERF_TEST is ON"
#endifGreat defensive guard — paired with the glob filter in
cmake/onnxruntime_unittests.cmake it ensures the regular target
fails loudly rather than silently linking in this file. ✅
L10. CMake: ${WINML_BINARY_DIR} is undocumented
File: cmake/winml_standalone_perf_test.cmake:111-115
"${WINML_BINARY_DIR}/onnxruntime.dll"
"${WINML_BINARY_DIR}/DirectML.dll"WINML_BINARY_DIR is presumably exported by the
microsoft.windows.ai.machinelearning CMake config. If that variable
isn't set (e.g. NuGet package version drift), the copy_if_different
silently skips and you get a runtime "DLL not found" instead of a
config-time error. Add:
if(NOT WINML_BINARY_DIR)
message(FATAL_ERROR "WINML_BINARY_DIR not set by Microsoft.Windows.AI.MachineLearning package")
endif()L11. CMake: FetchContent from a personal-fork repo
File: cmake/winml_standalone_perf_test.cmake:27-31
GIT_REPOSITORY https://github.com/mschofie/NuGetCMakePackage
GIT_TAG dc9e92672c6eb1c11f0d29d4f94731b3404cc096Pinning to a SHA is good. Pulling from mschofie/* (an individual
account) is a long-term liability — any account rename / repo deletion
breaks this build forever. Consider mirroring into a Microsoft-owned
repo before this lands on a shared branch.
L12. CMake glob for winml_standalone_perf_test_src
The glob picks up everything in perftest/ and perftest/windows/,
including any new file added by other PRs. That's the same pattern used
by onnxruntime_perf_test, so consistency is fine, but it means the
"don't compile winml_standalone.cc into the regular target" filter in
onnxruntime_unittests.cmake is the only line keeping the two
targets disjoint. If someone adds e.g. winml_standalone_helpers.cc,
they must remember to extend the regex. Consider switching to an
EXCLUDE REGEX ".*/winml_standalone.*\\.(cc|h)$" (prefix match) so
new files in this family are auto-excluded.
Things that look right
- Renaming is consistent across cmake, source, batch scripts, and
defines — no stragglers found in this branch. ORT_API_MANUAL_INITis propagated toonnx_test_runner_common
andonnxruntime_test_utilsconsistently with the regular target,
avoiding the#pragma detect_mismatchlink error.gsl::finallyensuresWinML_Uninitializeruns beforeOrt::Env
destruction, even on an exception path throughreal_main.WinML_Uninitializeswallows exceptions in the unregister loop —
appropriate for best-effort cleanup.- Removal of
winrt::hresult_errorcatch inmain.ccis correct
given CppWinRT is no longer linked. C++std::exceptioncatch
remains. - Glob exclusion of
winml_standalone.{cc,h}from
onnxruntime_perf_testis correct and prevents an accidental WinML
NuGet dependency on the regular target. - Manifest file (
onnxruntime/test/perftest/windows/app.manifest)
exists and is referenced by the cmake target.
Suggested follow-ups (not blocking)
- Add a smoke-test build job (even just a CI matrix entry) for
-Donnxruntime_BUILD_WINML_STANDALONE_PERF_TEST=ON. Without one,
silent breakage is very easy. - Move the static
g_ep_catalog/g_registered_providersstate into
a class soWinML_*becomes RAII and the order-of-destruction
guarantee inmain.ccis enforced by the type system. - Document the EXE deployment story (which DLLs go next to the EXE,
from where) in a short README underonnxruntime/test/perftest/.
Generated by Copilot CLI code review on 2026-05-06.
High severity: - H1 (main.cc): Remove ORT API version fallback. Hard-fail when the bundled onnxruntime.dll is older than the compile-time ORT_API_VERSION; falling back to an older v_N-shaped struct risks UB if any v25-only API is called. Newer is fine; older is not. - H2 (winml_standalone.cc): Replace the iterator-pair std::wstring ctor with MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, ...). Verified against WinMLEpCatalogApi.cpp that library paths come back as UTF-8. - H3 (winml_standalone.cc): Defensively bound `used` against `pathSize` and use strnlen() to determine the string length, instead of trusting `used - 1`. Documents the verified contract from the WinML source (size includes the NUL). Medium severity: - M1: Capture the HRESULT returned by WinMLEpCatalogEnumProviders and throw on failure so silent enumeration errors are surfaced. - M2: Track which filter-requested providers were registered. If any requested name failed to register, throw with the missing names so the perf test does not silently fall back to CPU-only. - M3: assert(!g_ep_catalog) on entry to WinML_InitializeAndRegisterAllProviders + comment that it must be called exactly once before WinML_Uninitialize. - M4: Removed naturally with H1 (the underflowing fallback loop is gone). - M5: Add a prominent link-order warning above the OrtGetApiBase redefinition. Low severity / nits: - L2: Remove unused ready_state_to_string helper. - L4: Rename the inner lambda local from `ctx` to `ec` so it does not visually shadow the outer EnumContext ctx. - L5: Drop the one-line WinML_FindAndRegisterAllProviders forwarder; keep only WinML_InitializeAndRegisterAllProviders. - L6: Extend the gsl::finally cleanup comment in main.cc to note the ordering with respect to g_ort. - L8: Add a comment explaining the EXE-dir-only DLL resolution is intentional (locks the runtime version against the WinML NuGet). - L10 (cmake): Fail at configure time with FATAL_ERROR if WINML_BINARY_DIR was not set by the microsoft.windows.ai.machinelearning package, rather than silently emitting a no-op post-build copy_if_different. - L12 (cmake): Switch the WinML standalone glob exclusion to a prefix match so any future winml_standalone_*.cc/h is automatically excluded from the regular onnxruntime_perf_test target. L7 follow-up (stale name references): - Rename winappsdk_onnxruntime_perf_test.md -> winml_standalone_perf_test.md and rewrite to drop WindowsAppSDK / CppWinRT details, document the new --winml_register_provider flag, and describe the ORT API version contract. - Update chrisd/copy-perf-test.cmd, go.cmd, go-all.cmd, go-nvidia-tests.cmd, go-openvino.cmd, go-qnn.cmd, and simple-intel-test-ape.cmd to reference winml_standalone_perf_test and drop --winappsdk_version flags. Out of scope this round: L3 (mixed I/O streams), L11 (NuGetCMakePackage mirroring), CI matrix entry, RAII refactor of the catalog handle. Verified: cmake configure + RelWithDebInfo build of both winml_standalone_perf_test and onnxruntime_perf_test succeed. The regular target's compile list confirms winml_standalone.cc is excluded. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement the three "Suggested follow-ups (not blocking)" from the PR #1 review. #1 (CI smoke build) -- .github/workflows/windows_winml_standalone.yml Configure + build the winml_standalone_perf_test target (and the regular onnxruntime_perf_test target as a sanity check) on windows-latest, RelWithDebInfo, x64. Triggered on push / pull_request on main, rel-*, winml-standalone*, rel-*-winml-standalone, and any user/** branch, scoped to the perftest source tree, the cmake files that drive the target, and the workflow file itself. No test-run step (would require NPU/GPU EP devices not present on hosted runners). Uses a GitHub-hosted runner instead of the 1ES self-hosted pools that the rest of the windows_*.yml workflows use, because those pools are gated to Microsoft's CI infrastructure and would not run on a personal fork. ilammy/msvc-dev-cmd is used for vcvars setup since the in-repo locate-vcvarsall-and-setup-env composite action targets the self-hosted images. #2 (RAII for catalog state) -- supersedes the round-1 M3 minimal-assert decision. Replace the file-static g_ep_catalog and g_registered_providers globals plus the WinML_* free functions with a move-only WinMLStandaloneRegistration class in winml_standalone.{h,cc}. Construction opens the catalog, enumerates providers, registers each one (subject to the --winml_register_provider filter), and throws on WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure, or any requested provider failing to register. Destruction unregisters in reverse order and releases the catalog. Partial-construction safety: the catalog handle is taken into the member as soon as WinMLEpCatalogCreate succeeds, then the rest of the constructor runs inside a try/catch that calls Cleanup() before rethrowing. This keeps the destructor and the constructor's failure path sharing a single noexcept Cleanup() implementation. Header keeps the catalog handle as a void* so winml_standalone.h does not pull <WinMLEpCatalog.h> (and therefore the WinML NuGet headers) into main.cc. main.cc now uses straight RAII -- the WinMLStandaloneRegistration object is declared right after the Ort::Env, replacing both the WinML_InitializeAndRegisterAllProviders call and the gsl::finally([&]{ WinML_Uninitialize(env); }) block. The <gsl/util> include is hoisted out of the BUILD_WINML_STANDALONE_PERF_TEST guard because main.cc still uses gsl::finally for plugin EP unregister. #3 (deployment README) -- onnxruntime/test/perftest/WINML_STANDALONE.md Short, focused doc that answers exactly the reviewer's question: which DLLs land next to the EXE and where does each one come from. Includes a table mapping each file to its source NuGet/build artifact and the cmake mechanism that copies it, an explanation of why onnxruntime.dll resolution is intentionally EXE-dir-only, the minimum redeployment payload, and the ORT API version contract. Cross-links to the comprehensive winml_standalone_perf_test.md at the repo root for build/run details. Verified: configure was already done previously; cmake --build build --config RelWithDebInfo --target winml_standalone_perf_test and --target onnxruntime_perf_test both succeed. The standalone target's compile list contains winml_standalone.cc; the regular target's does not (the prefix-match glob exclusion in cmake/onnxruntime_unittests.cmake still keeps them disjoint). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port the 16 review-fix items from #1 commit c28d084 to this branch. The pre-fix files are byte-identical between this branch (rel-1.24.6-winml-standalone, head 17dae90) and PR #1's pre-fix head (e5ee69c), so the post-fix files are taken straight from c28d084 via `git checkout c28d084 -- <path>`. Only the L12 prefix-match regex change in cmake/onnxruntime_unittests.cmake is applied as a surgical hunk because that file diverges by ~370 lines between the two branch bases. Maps to review items from the PR #1 review: H1 (main.cc): Remove the "fall back to older ORT API version" loop and hard-fail when the runtime DLL does not support ORT_API_VERSION. Falling back would return a struct laid out for an older version, so any newer-API call would dereference past the actual struct. The runtime must support the compile-time version or newer. H2 (winml_standalone.cc): Replace the iterator-pair std::wstring constructor with MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, ...) for the EP library path conversion. The previous code zero- extended each char to wchar_t, which silently corrupted multi-byte UTF-8 sequences from localized user folders. H3 (winml_standalone.cc): Defensively bound `used` against `pathSize` and use strnlen() to compute the final string length, with a comment quoting the verified WinMLEpCatalogApi.cpp contract (pathSize includes the NUL; used = pathSize on success). M1 (winml_standalone.cc): Capture the HRESULT from WinMLEpCatalogEnumProviders and throw on failure, instead of silently completing with zero registered providers. M2 (winml_standalone.cc): When --winml_register_provider is non-empty and any requested provider failed to register, throw before the caller silently falls back to CPU. M3 (winml_standalone.cc): Document the "must be called once" lifecycle of WinML_InitializeAndRegisterAllProviders and add an assert(!g_ep_catalog) on entry. (A follow-up commit will refactor the file-static globals into an RAII class.) M4 (main.cc): Removed naturally with H1 (the `for (uint32_t v ...)` underflow loop is gone). M5 (winml_standalone.cc): Add a prominent link-order warning banner above the OrtGetApiBase redefinition explaining the no-other-static- lib-defines-OrtGetApiBase contract. L2 (winml_standalone.cc): Remove the unused ready_state_to_string helper. L4 (winml_standalone.cc): Rename the lambda's inner `auto* ctx` to `auto* ec` so it does not shadow the outer EnumContext local. L5 (winml_standalone.cc): Drop the one-line WinML_FindAndRegisterAllProviders forwarder; merge its body into WinML_InitializeAndRegisterAllProviders. L6 (main.cc): Extend the gsl::finally cleanup comment to spell out that WinML_Uninitialize must run before g_ort is reset (if that is ever added) because UnregisterExecutionProviderLibrary dives through the C API. L7 (chrisd scripts + winappsdk_onnxruntime_perf_test.md): Rename stale `winappsdk_onnxruntime_perf_test` references to `winml_standalone_perf_test` in copy-perf-test.cmd, go.cmd, go-all.cmd, go-nvidia-tests.cmd, go-openvino.cmd, go-qnn.cmd, and simple-intel-test-ape.cmd. Drop the obsolete --winappsdk_version flag and replace --winappsdk_register_provider with --winml_register_provider. Rename winappsdk_onnxruntime_perf_test.md to winml_standalone_perf_test.md and rewrite content to reflect the standalone WinML model (drop the WindowsAppSDK bootstrap, drop Microsoft.Windows.CppWinRT, replace BUILD_WINAPPSDK_PERF_TEST with BUILD_WINML_STANDALONE_PERF_TEST, document the API version contract). L8 (winml_standalone.cc): Add a comment above the OrtGetApiBase LoadLibraryExW call explaining that EXE-dir-only resolution is intentional (the bundled WinML NuGet package is the contract owner of the runtime version). L10 (cmake/winml_standalone_perf_test.cmake): Add an `if(NOT WINML_BINARY_DIR) message(FATAL_ERROR ...)` guard right after find_package, so a NuGet-package layout drift fails at configure time rather than emitting a silent post-build copy that leaves the EXE missing onnxruntime.dll at runtime. L12 (cmake/onnxruntime_unittests.cmake): Switch the WinML standalone glob exclusion regex to a prefix match so any future winml_standalone_*.cc/h is automatically excluded from the regular onnxruntime_perf_test target without a manual regex update. Verified: configure (chrisd\p-x64.cmd) succeeded. cmake --build build --config RelWithDebInfo --target winml_standalone_perf_test and --target onnxruntime_perf_test both succeed. The standalone target's compile list contains winml_standalone.cc; the regular target's does not. ORT_API_VERSION on this branch is 24 (vs 25 on the chrisdMSFT/onnxruntime PR #1 head); all fixes use the macro, so no source adjustment was needed for the version delta. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement the three "Suggested follow-ups (not blocking)" from the PR #1 review. #1 (CI smoke build) -- .github/workflows/windows_winml_standalone.yml Configure + build the winml_standalone_perf_test target (and the regular onnxruntime_perf_test target as a sanity check) on windows-latest, RelWithDebInfo, x64. Triggered on push / pull_request on main, rel-*, winml-standalone*, rel-*-winml-standalone, and any user/** branch, scoped to the perftest source tree, the cmake files that drive the target, and the workflow file itself. No test-run step (would require NPU/GPU EP devices not present on hosted runners). Uses a GitHub-hosted runner instead of the 1ES self-hosted pools that the rest of the windows_*.yml workflows use, because those pools are gated to Microsoft's CI infrastructure and would not run on a personal fork. ilammy/msvc-dev-cmd is used for vcvars setup since the in-repo locate-vcvarsall-and-setup-env composite action targets the self-hosted images. #2 (RAII for catalog state) -- supersedes the round-1 M3 minimal-assert decision. Replace the file-static g_ep_catalog and g_registered_providers globals plus the WinML_* free functions with a move-only WinMLStandaloneRegistration class in winml_standalone.{h,cc}. Construction opens the catalog, enumerates providers, registers each one (subject to the --winml_register_provider filter), and throws on WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure, or any requested provider failing to register. Destruction unregisters in reverse order and releases the catalog. Partial-construction safety: the catalog handle is taken into the member as soon as WinMLEpCatalogCreate succeeds, then the rest of the constructor runs inside a try/catch that calls Cleanup() before rethrowing. This keeps the destructor and the constructor's failure path sharing a single noexcept Cleanup() implementation. Header keeps the catalog handle as a void* so winml_standalone.h does not pull <WinMLEpCatalog.h> (and therefore the WinML NuGet headers) into main.cc. main.cc now uses straight RAII -- the WinMLStandaloneRegistration object is declared right after the Ort::Env, replacing both the WinML_InitializeAndRegisterAllProviders call and the gsl::finally([&]{ WinML_Uninitialize(env); }) block. The <gsl/util> include is hoisted out of the BUILD_WINML_STANDALONE_PERF_TEST guard because main.cc still uses gsl::finally for plugin EP unregister. #3 (deployment README) -- onnxruntime/test/perftest/WINML_STANDALONE.md Short, focused doc that answers exactly the reviewer's question: which DLLs land next to the EXE and where does each one come from. Includes a table mapping each file to its source NuGet/build artifact and the cmake mechanism that copies it, an explanation of why onnxruntime.dll resolution is intentionally EXE-dir-only, the minimum redeployment payload, and the ORT API version contract. Cross-links to the comprehensive winml_standalone_perf_test.md at the repo root for build/run details. Verified: configure was already done previously; cmake --build build --config RelWithDebInfo --target winml_standalone_perf_test and --target onnxruntime_perf_test both succeed. The standalone target's compile list contains winml_standalone.cc; the regular target's does not (the prefix-match glob exclusion in cmake/onnxruntime_unittests.cmake still keeps them disjoint). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ft#28503) ### Description Add an internal session config entry, `"session.compile_only"`, set by `CompileModel()` before session initialization. The NvTensorRTRTX EP reads it in `NvExecutionProviderInfo::FromProviderOptions()` and, when set, skips `deserializeCudaEngine()` / `createExecutionContext()` in `CreateNodeComputeInfoFromGraph()`. The EP context node is still saved — that path uses the serialized engine buffer directly and does not depend on the deserialized engine. A stub compute function is registered to satisfy the framework; it returns `NOT_IMPLEMENTED` if called, which cannot happen in practice because compile-only sessions are destroyed without inference. ### Motivation and Context `OrtCompileAPI::CompileModel()` creates an `InferenceSession` solely to drive `EP::Compile()` and write out the EPContext model, then destroys it without running inference. During that session, the NvTensorRTRTX EP was performing a full `deserializeCudaEngine()` and `createExecutionContext()` — uploading engine weights to the GPU and JIT-ing the engine, only to free everything when the session was destroyed. When the user then loads the EPContext model in a real session, the same JIT and upload happen again. Net effect on the typical "compile, then load and run" flow: ``` ONNX model → CompileModel() [JIT + GPU upload #1 — discarded] → EP context model saved to disk → Session from EP context model [JIT + GPU upload #2 — necessary] → Inference ``` JIT and GPU upload run twice.
Summary
Converts the winappsdk_onnxruntime_perf_test target to use standalone WinML via the flat-C WinMLEpCatalog API from the
Microsoft.Windows.AI.MachineLearning NuGet package. This removes the dependency on WindowsAppSDK bootstrap/WinRT activation and allows the perf test EXE to run standalone on any Windows machine with the WinML , Onnxruntime DLLs present alongside the executable.
Key design decisions
ORT API version fallback — The WinML NuGet package (v2.0.297-preview) ships ORT
1.24.4 (API v24) while repo headers define v25. At runtime, we try the compile-time version first, then gracefully fall back to the highest supported version available.
Glob exclusion — winml_standalone.cc/h are excluded from the regular onnxruntime_perf_test target to prevent compilation errors (those files depend on WinML NuGet APIs).
Files
ort_test_session.cc, chrisd/b.cmd, chrisd/p-x64.cmd, chrisd/p-arm.cmd
Build & usage
Configure
cmake -B build -Donnxruntime_BUILD_WINML_STANDALONE_PERF_TEST=ON -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_USE_DML=OFF -Donnxruntime_USE_WINML=OFF ... -A x64 -G "Visual Studio 17 2022"
Build
cmake --build build --config RelWithDebInfo --target winml_standalone_perf_test
Run
winml_standalone_perf_test.exe --required_device_type npu -e vitisai -m duration -t 30 -I "C:\Users\Sumit\standalone-perf-test\abc.quant.onnx"
Testing
Vitis Test logs
QNN Test logs