Rel 1.24.6 winml standalone by chrisdMSFT · Pull Request #2 · chrisdMSFT/onnxruntime

chrisdMSFT · 2026-05-07T01:46:15Z

Description

Motivation and Context

Implement the three "Suggested follow-ups (not blocking)" from the PR #1 review. #1 (CI smoke build) -- .github/workflows/windows_winml_standalone.yml Configure + build the winml_standalone_perf_test target (and the regular onnxruntime_perf_test target as a sanity check) on windows-latest, RelWithDebInfo, x64. Triggered on push / pull_request on main, rel-*, winml-standalone*, rel-*-winml-standalone, and any user/** branch, scoped to the perftest source tree, the cmake files that drive the target, and the workflow file itself. No test-run step (would require NPU/GPU EP devices not present on hosted runners). Uses a GitHub-hosted runner instead of the 1ES self-hosted pools that the rest of the windows_*.yml workflows use, because those pools are gated to Microsoft's CI infrastructure and would not run on a personal fork. ilammy/msvc-dev-cmd is used for vcvars setup since the in-repo locate-vcvarsall-and-setup-env composite action targets the self-hosted images. #2 (RAII for catalog state) -- supersedes the round-1 M3 minimal-assert decision. Replace the file-static g_ep_catalog and g_registered_providers globals plus the WinML_* free functions with a move-only WinMLStandaloneRegistration class in winml_standalone.{h,cc}. Construction opens the catalog, enumerates providers, registers each one (subject to the --winml_register_provider filter), and throws on WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure, or any requested provider failing to register. Destruction unregisters in reverse order and releases the catalog. Partial-construction safety: the catalog handle is taken into the member as soon as WinMLEpCatalogCreate succeeds, then the rest of the constructor runs inside a try/catch that calls Cleanup() before rethrowing. This keeps the destructor and the constructor's failure path sharing a single noexcept Cleanup() implementation. Header keeps the catalog handle as a void* so winml_standalone.h does not pull <WinMLEpCatalog.h> (and therefore the WinML NuGet headers) into main.cc. main.cc now uses straight RAII -- the WinMLStandaloneRegistration object is declared right after the Ort::Env, replacing both the WinML_InitializeAndRegisterAllProviders call and the gsl::finally([&]{ WinML_Uninitialize(env); }) block. The <gsl/util> include is hoisted out of the BUILD_WINML_STANDALONE_PERF_TEST guard because main.cc still uses gsl::finally for plugin EP unregister. #3 (deployment README) -- onnxruntime/test/perftest/WINML_STANDALONE.md Short, focused doc that answers exactly the reviewer's question: which DLLs land next to the EXE and where does each one come from. Includes a table mapping each file to its source NuGet/build artifact and the cmake mechanism that copies it, an explanation of why onnxruntime.dll resolution is intentionally EXE-dir-only, the minimum redeployment payload, and the ORT API version contract. Cross-links to the comprehensive winml_standalone_perf_test.md at the repo root for build/run details. Verified: configure was already done previously; cmake --build build --config RelWithDebInfo --target winml_standalone_perf_test and --target onnxruntime_perf_test both succeed. The standalone target's compile list contains winml_standalone.cc; the regular target's does not (the prefix-match glob exclusion in cmake/onnxruntime_unittests.cmake still keeps them disjoint). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds a new build target, winml_standalone_perf_test, that runs the existing onnxruntime perf test harness against a standalone WinML deployment. Instead of linking directly to ORT, the EXE loads onnxruntime.dll from its own directory via the WinML EP catalog, then registers the providers exposed by the bundled WinML NuGet package. Source layout ------------- - onnxruntime/test/perftest/windows/winml_standalone.{h,cc}: move-only RAII WinMLStandaloneRegistration class. Construction opens the WinML EP catalog, enumerates providers, and registers each one (subject to the --winml_register_provider filter), throwing on WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure, or any requested provider failing to register. Destruction unregisters in reverse order and releases the catalog. The header keeps the catalog handle as a void* so it does not pull WinMLEpCatalog.h into main.cc. - onnxruntime/test/perftest/main.cc: integrate the registration object via straight RAII (declared right after Ort::Env). Hard-fail if the runtime DLL does not support ORT_API_VERSION rather than silently falling back to an older API version (a struct laid out for an older version would be dereferenced past its actual size on any newer-API call). - onnxruntime/test/perftest/command_args_parser.cc, test_configuration.h, ort_test_session.cc, performance_runner.cc, windows/app.manifest: command-line plumbing for --winml_register_provider plus the manifest required for the standalone EXE. - cmake/winml_standalone_perf_test.cmake: defines the new target and the post-build copy of the WinML NuGet runtime payload next to the EXE. Emits FATAL_ERROR if WINML_BINARY_DIR is empty so a NuGet layout drift fails at configure time rather than producing a broken EXE that is missing onnxruntime.dll at runtime. - cmake/CMakeLists.txt: hook in the new optional target via BUILD_WINML_STANDALONE_PERF_TEST. - cmake/onnxruntime_unittests.cmake: prefix-match glob exclusion so any future winml_standalone_*.cc/h is automatically excluded from the regular onnxruntime_perf_test target without a manual update. - .github/workflows/windows_winml_standalone.yml: hosted-runner CI smoke build of both the standalone target and the regular onnxruntime_perf_test target. No test-run step (would require NPU/GPU EP devices not available on hosted runners). - onnxruntime/test/perftest/WINML_STANDALONE.md and winml_standalone_perf_test.md: deployment and build/run docs covering the EXE-dir-only onnxruntime.dll resolution rationale, the ORT_API_VERSION contract, the minimum redeployment payload, and configure/build instructions. - .gitignore: ignore _deps/ but keep the new app.manifest tracked. UTF-8 safety: the EP library path returned by WinML is converted via MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, ...) instead of zero-extending each byte to wchar_t, so multi-byte UTF-8 sequences from localized user folders are handled correctly. Verified: cmake --build build --config RelWithDebInfo --target winml_standalone_perf_test and --target onnxruntime_perf_test both succeed. The standalone target's compile list contains winml_standalone.cc; the regular target's does not. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds personal helper scripts under chrisd/ for the build/perftest workflow (p-x64.cmd, p-arm.cmd, b.cmd, go.cmd, go-all.cmd, go-cmake-logs.cmd, go-nvidia-tests.cmd, go-openvino.cmd, go-qnn.cmd, copy-perf-test.cmd, simple-intel-test-ape.cmd). Tweaks .vscode/settings.json to exclude build_*/, build/, cmake/external/, node_modules/, and .git/ from C++ parsing and from search, and sets the Python env manager defaults to conda. Also fixes a few inaccuracies in winml_standalone_perf_test.md and a matching stale code comment in onnxruntime/test/perftest/main.cc: correct ORT_API_VERSION (24, not 25), rewrite the Run-per-EP table to describe what each chrisd/go-*.cmd actually does, and fix the chrisd/go-cmake-logs.cmd description. These chrisd/ files are personal scratch / convenience aids only -- they are not referenced by any production build or test target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ft#28503) ### Description Add an internal session config entry, `"session.compile_only"`, set by `CompileModel()` before session initialization. The NvTensorRTRTX EP reads it in `NvExecutionProviderInfo::FromProviderOptions()` and, when set, skips `deserializeCudaEngine()` / `createExecutionContext()` in `CreateNodeComputeInfoFromGraph()`. The EP context node is still saved — that path uses the serialized engine buffer directly and does not depend on the deserialized engine. A stub compute function is registered to satisfy the framework; it returns `NOT_IMPLEMENTED` if called, which cannot happen in practice because compile-only sessions are destroyed without inference. ### Motivation and Context `OrtCompileAPI::CompileModel()` creates an `InferenceSession` solely to drive `EP::Compile()` and write out the EPContext model, then destroys it without running inference. During that session, the NvTensorRTRTX EP was performing a full `deserializeCudaEngine()` and `createExecutionContext()` — uploading engine weights to the GPU and JIT-ing the engine, only to free everything when the session was destroyed. When the user then loads the EPContext model in a real session, the same JIT and upload happen again. Net effect on the typical "compile, then load and run" flow: ``` ONNX model → CompileModel() [JIT + GPU upload #1 — discarded] → EP context model saved to disk → Session from EP context model [JIT + GPU upload #2 — necessary] → Inference ``` JIT and GPU upload run twice.

chrisdMSFT force-pushed the rel-1.24.6-winml-standalone branch 3 times, most recently from 2a8fa7a to 39af498 Compare May 7, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rel 1.24.6 winml standalone#2

Rel 1.24.6 winml standalone#2
chrisdMSFT wants to merge 2 commits into
chrisd/rel-1.24.6from
rel-1.24.6-winml-standalone

chrisdMSFT commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chrisdMSFT commented May 7, 2026

Description

Motivation and Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant