Rel 1.24.6 winml standalone#2
Draft
chrisdMSFT wants to merge 2 commits into
Draft
Conversation
chrisdMSFT
added a commit
that referenced
this pull request
May 7, 2026
Implement the three "Suggested follow-ups (not blocking)" from the PR #1 review. #1 (CI smoke build) -- .github/workflows/windows_winml_standalone.yml Configure + build the winml_standalone_perf_test target (and the regular onnxruntime_perf_test target as a sanity check) on windows-latest, RelWithDebInfo, x64. Triggered on push / pull_request on main, rel-*, winml-standalone*, rel-*-winml-standalone, and any user/** branch, scoped to the perftest source tree, the cmake files that drive the target, and the workflow file itself. No test-run step (would require NPU/GPU EP devices not present on hosted runners). Uses a GitHub-hosted runner instead of the 1ES self-hosted pools that the rest of the windows_*.yml workflows use, because those pools are gated to Microsoft's CI infrastructure and would not run on a personal fork. ilammy/msvc-dev-cmd is used for vcvars setup since the in-repo locate-vcvarsall-and-setup-env composite action targets the self-hosted images. #2 (RAII for catalog state) -- supersedes the round-1 M3 minimal-assert decision. Replace the file-static g_ep_catalog and g_registered_providers globals plus the WinML_* free functions with a move-only WinMLStandaloneRegistration class in winml_standalone.{h,cc}. Construction opens the catalog, enumerates providers, registers each one (subject to the --winml_register_provider filter), and throws on WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure, or any requested provider failing to register. Destruction unregisters in reverse order and releases the catalog. Partial-construction safety: the catalog handle is taken into the member as soon as WinMLEpCatalogCreate succeeds, then the rest of the constructor runs inside a try/catch that calls Cleanup() before rethrowing. This keeps the destructor and the constructor's failure path sharing a single noexcept Cleanup() implementation. Header keeps the catalog handle as a void* so winml_standalone.h does not pull <WinMLEpCatalog.h> (and therefore the WinML NuGet headers) into main.cc. main.cc now uses straight RAII -- the WinMLStandaloneRegistration object is declared right after the Ort::Env, replacing both the WinML_InitializeAndRegisterAllProviders call and the gsl::finally([&]{ WinML_Uninitialize(env); }) block. The <gsl/util> include is hoisted out of the BUILD_WINML_STANDALONE_PERF_TEST guard because main.cc still uses gsl::finally for plugin EP unregister. #3 (deployment README) -- onnxruntime/test/perftest/WINML_STANDALONE.md Short, focused doc that answers exactly the reviewer's question: which DLLs land next to the EXE and where does each one come from. Includes a table mapping each file to its source NuGet/build artifact and the cmake mechanism that copies it, an explanation of why onnxruntime.dll resolution is intentionally EXE-dir-only, the minimum redeployment payload, and the ORT API version contract. Cross-links to the comprehensive winml_standalone_perf_test.md at the repo root for build/run details. Verified: configure was already done previously; cmake --build build --config RelWithDebInfo --target winml_standalone_perf_test and --target onnxruntime_perf_test both succeed. The standalone target's compile list contains winml_standalone.cc; the regular target's does not (the prefix-match glob exclusion in cmake/onnxruntime_unittests.cmake still keeps them disjoint). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chrisdMSFT
added a commit
that referenced
this pull request
May 7, 2026
Implement the three "Suggested follow-ups (not blocking)" from the PR #1 review. #1 (CI smoke build) -- .github/workflows/windows_winml_standalone.yml Configure + build the winml_standalone_perf_test target (and the regular onnxruntime_perf_test target as a sanity check) on windows-latest, RelWithDebInfo, x64. Triggered on push / pull_request on main, rel-*, winml-standalone*, rel-*-winml-standalone, and any user/** branch, scoped to the perftest source tree, the cmake files that drive the target, and the workflow file itself. No test-run step (would require NPU/GPU EP devices not present on hosted runners). Uses a GitHub-hosted runner instead of the 1ES self-hosted pools that the rest of the windows_*.yml workflows use, because those pools are gated to Microsoft's CI infrastructure and would not run on a personal fork. ilammy/msvc-dev-cmd is used for vcvars setup since the in-repo locate-vcvarsall-and-setup-env composite action targets the self-hosted images. #2 (RAII for catalog state) -- supersedes the round-1 M3 minimal-assert decision. Replace the file-static g_ep_catalog and g_registered_providers globals plus the WinML_* free functions with a move-only WinMLStandaloneRegistration class in winml_standalone.{h,cc}. Construction opens the catalog, enumerates providers, registers each one (subject to the --winml_register_provider filter), and throws on WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure, or any requested provider failing to register. Destruction unregisters in reverse order and releases the catalog. Partial-construction safety: the catalog handle is taken into the member as soon as WinMLEpCatalogCreate succeeds, then the rest of the constructor runs inside a try/catch that calls Cleanup() before rethrowing. This keeps the destructor and the constructor's failure path sharing a single noexcept Cleanup() implementation. Header keeps the catalog handle as a void* so winml_standalone.h does not pull <WinMLEpCatalog.h> (and therefore the WinML NuGet headers) into main.cc. main.cc now uses straight RAII -- the WinMLStandaloneRegistration object is declared right after the Ort::Env, replacing both the WinML_InitializeAndRegisterAllProviders call and the gsl::finally([&]{ WinML_Uninitialize(env); }) block. The <gsl/util> include is hoisted out of the BUILD_WINML_STANDALONE_PERF_TEST guard because main.cc still uses gsl::finally for plugin EP unregister. #3 (deployment README) -- onnxruntime/test/perftest/WINML_STANDALONE.md Short, focused doc that answers exactly the reviewer's question: which DLLs land next to the EXE and where does each one come from. Includes a table mapping each file to its source NuGet/build artifact and the cmake mechanism that copies it, an explanation of why onnxruntime.dll resolution is intentionally EXE-dir-only, the minimum redeployment payload, and the ORT API version contract. Cross-links to the comprehensive winml_standalone_perf_test.md at the repo root for build/run details. Verified: configure was already done previously; cmake --build build --config RelWithDebInfo --target winml_standalone_perf_test and --target onnxruntime_perf_test both succeed. The standalone target's compile list contains winml_standalone.cc; the regular target's does not (the prefix-match glob exclusion in cmake/onnxruntime_unittests.cmake still keeps them disjoint). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a new build target, winml_standalone_perf_test, that runs the
existing onnxruntime perf test harness against a standalone WinML
deployment. Instead of linking directly to ORT, the EXE loads
onnxruntime.dll from its own directory via the WinML EP catalog, then
registers the providers exposed by the bundled WinML NuGet package.
Source layout
-------------
- onnxruntime/test/perftest/windows/winml_standalone.{h,cc}: move-only
RAII WinMLStandaloneRegistration class. Construction opens the WinML
EP catalog, enumerates providers, and registers each one (subject to
the --winml_register_provider filter), throwing on
WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure, or
any requested provider failing to register. Destruction unregisters
in reverse order and releases the catalog. The header keeps the
catalog handle as a void* so it does not pull WinMLEpCatalog.h into
main.cc.
- onnxruntime/test/perftest/main.cc: integrate the registration object
via straight RAII (declared right after Ort::Env). Hard-fail if the
runtime DLL does not support ORT_API_VERSION rather than silently
falling back to an older API version (a struct laid out for an older
version would be dereferenced past its actual size on any newer-API
call).
- onnxruntime/test/perftest/command_args_parser.cc,
test_configuration.h, ort_test_session.cc, performance_runner.cc,
windows/app.manifest: command-line plumbing for
--winml_register_provider plus the manifest required for the
standalone EXE.
- cmake/winml_standalone_perf_test.cmake: defines the new target and
the post-build copy of the WinML NuGet runtime payload next to the
EXE. Emits FATAL_ERROR if WINML_BINARY_DIR is empty so a NuGet
layout drift fails at configure time rather than producing a broken
EXE that is missing onnxruntime.dll at runtime.
- cmake/CMakeLists.txt: hook in the new optional target via
BUILD_WINML_STANDALONE_PERF_TEST.
- cmake/onnxruntime_unittests.cmake: prefix-match glob exclusion so
any future winml_standalone_*.cc/h is automatically excluded from
the regular onnxruntime_perf_test target without a manual update.
- .github/workflows/windows_winml_standalone.yml: hosted-runner CI
smoke build of both the standalone target and the regular
onnxruntime_perf_test target. No test-run step (would require
NPU/GPU EP devices not available on hosted runners).
- onnxruntime/test/perftest/WINML_STANDALONE.md and
winml_standalone_perf_test.md: deployment and build/run docs
covering the EXE-dir-only onnxruntime.dll resolution rationale, the
ORT_API_VERSION contract, the minimum redeployment payload, and
configure/build instructions.
- .gitignore: ignore _deps/ but keep the new app.manifest tracked.
UTF-8 safety: the EP library path returned by WinML is converted via
MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, ...) instead of
zero-extending each byte to wchar_t, so multi-byte UTF-8 sequences
from localized user folders are handled correctly.
Verified: cmake --build build --config RelWithDebInfo --target
winml_standalone_perf_test and --target onnxruntime_perf_test both
succeed. The standalone target's compile list contains
winml_standalone.cc; the regular target's does not.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2a8fa7a to
39af498
Compare
Adds personal helper scripts under chrisd/ for the build/perftest workflow (p-x64.cmd, p-arm.cmd, b.cmd, go.cmd, go-all.cmd, go-cmake-logs.cmd, go-nvidia-tests.cmd, go-openvino.cmd, go-qnn.cmd, copy-perf-test.cmd, simple-intel-test-ape.cmd). Tweaks .vscode/settings.json to exclude build_*/, build/, cmake/external/, node_modules/, and .git/ from C++ parsing and from search, and sets the Python env manager defaults to conda. Also fixes a few inaccuracies in winml_standalone_perf_test.md and a matching stale code comment in onnxruntime/test/perftest/main.cc: correct ORT_API_VERSION (24, not 25), rewrite the Run-per-EP table to describe what each chrisd/go-*.cmd actually does, and fix the chrisd/go-cmake-logs.cmd description. These chrisd/ files are personal scratch / convenience aids only -- they are not referenced by any production build or test target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chrisdMSFT
pushed a commit
that referenced
this pull request
Jun 11, 2026
…ft#28503) ### Description Add an internal session config entry, `"session.compile_only"`, set by `CompileModel()` before session initialization. The NvTensorRTRTX EP reads it in `NvExecutionProviderInfo::FromProviderOptions()` and, when set, skips `deserializeCudaEngine()` / `createExecutionContext()` in `CreateNodeComputeInfoFromGraph()`. The EP context node is still saved — that path uses the serialized engine buffer directly and does not depend on the deserialized engine. A stub compute function is registered to satisfy the framework; it returns `NOT_IMPLEMENTED` if called, which cannot happen in practice because compile-only sessions are destroyed without inference. ### Motivation and Context `OrtCompileAPI::CompileModel()` creates an `InferenceSession` solely to drive `EP::Compile()` and write out the EPContext model, then destroys it without running inference. During that session, the NvTensorRTRTX EP was performing a full `deserializeCudaEngine()` and `createExecutionContext()` — uploading engine weights to the GPU and JIT-ing the engine, only to free everything when the session was destroyed. When the user then loads the EPContext model in a real session, the same JIT and upload happen again. Net effect on the typical "compile, then load and run" flow: ``` ONNX model → CompileModel() [JIT + GPU upload #1 — discarded] → EP context model saved to disk → Session from EP context model [JIT + GPU upload #2 — necessary] → Inference ``` JIT and GPU upload run twice.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Motivation and Context