Skip to content

Rel 1.24.6 winml standalone#2

Draft
chrisdMSFT wants to merge 2 commits into
chrisd/rel-1.24.6from
rel-1.24.6-winml-standalone
Draft

Rel 1.24.6 winml standalone#2
chrisdMSFT wants to merge 2 commits into
chrisd/rel-1.24.6from
rel-1.24.6-winml-standalone

Conversation

@chrisdMSFT

Copy link
Copy Markdown
Owner

Description

Motivation and Context

chrisdMSFT added a commit that referenced this pull request May 7, 2026
Implement the three "Suggested follow-ups (not blocking)" from the PR
#1 review.

#1 (CI smoke build) -- .github/workflows/windows_winml_standalone.yml
    Configure + build the winml_standalone_perf_test target (and the
    regular onnxruntime_perf_test target as a sanity check) on
    windows-latest, RelWithDebInfo, x64. Triggered on push / pull_request
    on main, rel-*, winml-standalone*, rel-*-winml-standalone, and any
    user/** branch, scoped to the perftest source tree, the cmake files
    that drive the target, and the workflow file itself. No test-run
    step (would require NPU/GPU EP devices not present on hosted
    runners).

    Uses a GitHub-hosted runner instead of the 1ES self-hosted pools
    that the rest of the windows_*.yml workflows use, because those
    pools are gated to Microsoft's CI infrastructure and would not run
    on a personal fork. ilammy/msvc-dev-cmd is used for vcvars setup
    since the in-repo locate-vcvarsall-and-setup-env composite action
    targets the self-hosted images.

#2 (RAII for catalog state) -- supersedes the round-1 M3 minimal-assert
    decision. Replace the file-static g_ep_catalog and
    g_registered_providers globals plus the WinML_* free functions
    with a move-only WinMLStandaloneRegistration class
    in winml_standalone.{h,cc}. Construction opens the catalog,
    enumerates providers, registers each one (subject to the
    --winml_register_provider filter), and throws on
    WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure,
    or any requested provider failing to register. Destruction
    unregisters in reverse order and releases the catalog.

    Partial-construction safety: the catalog handle is taken into the
    member as soon as WinMLEpCatalogCreate succeeds, then the rest of
    the constructor runs inside a try/catch that calls Cleanup() before
    rethrowing. This keeps the destructor and the constructor's failure
    path sharing a single noexcept Cleanup() implementation.

    Header keeps the catalog handle as a void* so winml_standalone.h
    does not pull <WinMLEpCatalog.h> (and therefore the WinML NuGet
    headers) into main.cc.

    main.cc now uses straight RAII -- the WinMLStandaloneRegistration
    object is declared right after the Ort::Env, replacing both the
    WinML_InitializeAndRegisterAllProviders call and the
    gsl::finally([&]{ WinML_Uninitialize(env); }) block. The
    <gsl/util> include is hoisted out of the BUILD_WINML_STANDALONE_PERF_TEST
    guard because main.cc still uses gsl::finally for plugin EP unregister.

#3 (deployment README) -- onnxruntime/test/perftest/WINML_STANDALONE.md
    Short, focused doc that answers exactly the reviewer's question:
    which DLLs land next to the EXE and where does each one come from.
    Includes a table mapping each file to its source NuGet/build
    artifact and the cmake mechanism that copies it, an explanation of
    why onnxruntime.dll resolution is intentionally EXE-dir-only, the
    minimum redeployment payload, and the ORT API version contract.
    Cross-links to the comprehensive winml_standalone_perf_test.md at
    the repo root for build/run details.

Verified: configure was already done previously; cmake --build build
--config RelWithDebInfo --target winml_standalone_perf_test and
--target onnxruntime_perf_test both succeed. The standalone target's
compile list contains winml_standalone.cc; the regular target's does
not (the prefix-match glob exclusion in cmake/onnxruntime_unittests.cmake
still keeps them disjoint).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chrisdMSFT added a commit that referenced this pull request May 7, 2026
Implement the three "Suggested follow-ups (not blocking)" from the PR
#1 review.

#1 (CI smoke build) -- .github/workflows/windows_winml_standalone.yml
    Configure + build the winml_standalone_perf_test target (and the
    regular onnxruntime_perf_test target as a sanity check) on
    windows-latest, RelWithDebInfo, x64. Triggered on push / pull_request
    on main, rel-*, winml-standalone*, rel-*-winml-standalone, and any
    user/** branch, scoped to the perftest source tree, the cmake files
    that drive the target, and the workflow file itself. No test-run
    step (would require NPU/GPU EP devices not present on hosted
    runners).

    Uses a GitHub-hosted runner instead of the 1ES self-hosted pools
    that the rest of the windows_*.yml workflows use, because those
    pools are gated to Microsoft's CI infrastructure and would not run
    on a personal fork. ilammy/msvc-dev-cmd is used for vcvars setup
    since the in-repo locate-vcvarsall-and-setup-env composite action
    targets the self-hosted images.

#2 (RAII for catalog state) -- supersedes the round-1 M3 minimal-assert
    decision. Replace the file-static g_ep_catalog and
    g_registered_providers globals plus the WinML_* free functions
    with a move-only WinMLStandaloneRegistration class
    in winml_standalone.{h,cc}. Construction opens the catalog,
    enumerates providers, registers each one (subject to the
    --winml_register_provider filter), and throws on
    WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure,
    or any requested provider failing to register. Destruction
    unregisters in reverse order and releases the catalog.

    Partial-construction safety: the catalog handle is taken into the
    member as soon as WinMLEpCatalogCreate succeeds, then the rest of
    the constructor runs inside a try/catch that calls Cleanup() before
    rethrowing. This keeps the destructor and the constructor's failure
    path sharing a single noexcept Cleanup() implementation.

    Header keeps the catalog handle as a void* so winml_standalone.h
    does not pull <WinMLEpCatalog.h> (and therefore the WinML NuGet
    headers) into main.cc.

    main.cc now uses straight RAII -- the WinMLStandaloneRegistration
    object is declared right after the Ort::Env, replacing both the
    WinML_InitializeAndRegisterAllProviders call and the
    gsl::finally([&]{ WinML_Uninitialize(env); }) block. The
    <gsl/util> include is hoisted out of the BUILD_WINML_STANDALONE_PERF_TEST
    guard because main.cc still uses gsl::finally for plugin EP unregister.

#3 (deployment README) -- onnxruntime/test/perftest/WINML_STANDALONE.md
    Short, focused doc that answers exactly the reviewer's question:
    which DLLs land next to the EXE and where does each one come from.
    Includes a table mapping each file to its source NuGet/build
    artifact and the cmake mechanism that copies it, an explanation of
    why onnxruntime.dll resolution is intentionally EXE-dir-only, the
    minimum redeployment payload, and the ORT API version contract.
    Cross-links to the comprehensive winml_standalone_perf_test.md at
    the repo root for build/run details.

Verified: configure was already done previously; cmake --build build
--config RelWithDebInfo --target winml_standalone_perf_test and
--target onnxruntime_perf_test both succeed. The standalone target's
compile list contains winml_standalone.cc; the regular target's does
not (the prefix-match glob exclusion in cmake/onnxruntime_unittests.cmake
still keeps them disjoint).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a new build target, winml_standalone_perf_test, that runs the
existing onnxruntime perf test harness against a standalone WinML
deployment. Instead of linking directly to ORT, the EXE loads
onnxruntime.dll from its own directory via the WinML EP catalog, then
registers the providers exposed by the bundled WinML NuGet package.

Source layout
-------------
- onnxruntime/test/perftest/windows/winml_standalone.{h,cc}: move-only
  RAII WinMLStandaloneRegistration class. Construction opens the WinML
  EP catalog, enumerates providers, and registers each one (subject to
  the --winml_register_provider filter), throwing on
  WinMLEpCatalogCreate failure, WinMLEpCatalogEnumProviders failure, or
  any requested provider failing to register. Destruction unregisters
  in reverse order and releases the catalog. The header keeps the
  catalog handle as a void* so it does not pull WinMLEpCatalog.h into
  main.cc.
- onnxruntime/test/perftest/main.cc: integrate the registration object
  via straight RAII (declared right after Ort::Env). Hard-fail if the
  runtime DLL does not support ORT_API_VERSION rather than silently
  falling back to an older API version (a struct laid out for an older
  version would be dereferenced past its actual size on any newer-API
  call).
- onnxruntime/test/perftest/command_args_parser.cc,
  test_configuration.h, ort_test_session.cc, performance_runner.cc,
  windows/app.manifest: command-line plumbing for
  --winml_register_provider plus the manifest required for the
  standalone EXE.
- cmake/winml_standalone_perf_test.cmake: defines the new target and
  the post-build copy of the WinML NuGet runtime payload next to the
  EXE. Emits FATAL_ERROR if WINML_BINARY_DIR is empty so a NuGet
  layout drift fails at configure time rather than producing a broken
  EXE that is missing onnxruntime.dll at runtime.
- cmake/CMakeLists.txt: hook in the new optional target via
  BUILD_WINML_STANDALONE_PERF_TEST.
- cmake/onnxruntime_unittests.cmake: prefix-match glob exclusion so
  any future winml_standalone_*.cc/h is automatically excluded from
  the regular onnxruntime_perf_test target without a manual update.
- .github/workflows/windows_winml_standalone.yml: hosted-runner CI
  smoke build of both the standalone target and the regular
  onnxruntime_perf_test target. No test-run step (would require
  NPU/GPU EP devices not available on hosted runners).
- onnxruntime/test/perftest/WINML_STANDALONE.md and
  winml_standalone_perf_test.md: deployment and build/run docs
  covering the EXE-dir-only onnxruntime.dll resolution rationale, the
  ORT_API_VERSION contract, the minimum redeployment payload, and
  configure/build instructions.
- .gitignore: ignore _deps/ but keep the new app.manifest tracked.

UTF-8 safety: the EP library path returned by WinML is converted via
MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, ...) instead of
zero-extending each byte to wchar_t, so multi-byte UTF-8 sequences
from localized user folders are handled correctly.

Verified: cmake --build build --config RelWithDebInfo --target
winml_standalone_perf_test and --target onnxruntime_perf_test both
succeed. The standalone target's compile list contains
winml_standalone.cc; the regular target's does not.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@chrisdMSFT chrisdMSFT force-pushed the rel-1.24.6-winml-standalone branch 3 times, most recently from 2a8fa7a to 39af498 Compare May 7, 2026 17:14
Adds personal helper scripts under chrisd/ for the build/perftest
workflow (p-x64.cmd, p-arm.cmd, b.cmd, go.cmd, go-all.cmd,
go-cmake-logs.cmd, go-nvidia-tests.cmd, go-openvino.cmd, go-qnn.cmd,
copy-perf-test.cmd, simple-intel-test-ape.cmd).

Tweaks .vscode/settings.json to exclude build_*/, build/,
cmake/external/, node_modules/, and .git/ from C++ parsing and from
search, and sets the Python env manager defaults to conda.

Also fixes a few inaccuracies in winml_standalone_perf_test.md and a
matching stale code comment in onnxruntime/test/perftest/main.cc:
correct ORT_API_VERSION (24, not 25), rewrite the Run-per-EP table to
describe what each chrisd/go-*.cmd actually does, and fix the
chrisd/go-cmake-logs.cmd description.

These chrisd/ files are personal scratch / convenience aids only --
they are not referenced by any production build or test target.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chrisdMSFT pushed a commit that referenced this pull request Jun 11, 2026
…ft#28503)

### Description

Add an internal session config entry, `"session.compile_only"`, set by
`CompileModel()` before
  session initialization. The NvTensorRTRTX EP reads it in
`NvExecutionProviderInfo::FromProviderOptions()` and, when set, skips
`deserializeCudaEngine()` /
  `createExecutionContext()` in `CreateNodeComputeInfoFromGraph()`.

The EP context node is still saved — that path uses the serialized
engine buffer directly and does
not depend on the deserialized engine. A stub compute function is
registered to satisfy the
framework; it returns `NOT_IMPLEMENTED` if called, which cannot happen
in practice because
  compile-only sessions are destroyed without inference.


### Motivation and Context

`OrtCompileAPI::CompileModel()` creates an `InferenceSession` solely to
drive `EP::Compile()` and
write out the EPContext model, then destroys it without running
inference. During that session, the
NvTensorRTRTX EP was performing a full `deserializeCudaEngine()` and
`createExecutionContext()` —
uploading engine weights to the GPU and JIT-ing the engine, only to free
everything when the session
  was destroyed.

When the user then loads the EPContext model in a real session, the same
JIT and upload happen again.
   Net effect on the typical "compile, then load and run" flow:

  ```
  ONNX model
      → CompileModel()         [JIT + GPU upload #1 — discarded]
      → EP context model saved to disk
      → Session from EP context model
                               [JIT + GPU upload #2 — necessary]
      → Inference
  ```

  JIT and GPU upload run twice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant