From 88cd66c27fdb4df368cc01483f8fc0a97d42f1e2 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 18 Feb 2026 13:02:27 -0600 Subject: [PATCH 01/61] Add POSIX telemetry --- ThirdPartyNotices.txt | 222 +++++++ build.bat | 2 +- build.sh | 2 +- cmake/CMakeLists.txt | 3 + cmake/deps.txt | 2 + .../external/onnxruntime_external_deps.cmake | 27 +- cmake/onnxruntime_1ds_telemetry.cmake | 35 + cmake/onnxruntime_common.cmake | 40 +- onnxruntime/core/platform/posix/env.cc | 8 + onnxruntime/core/platform/posix/telemetry.cc | 601 ++++++++++++++++++ onnxruntime/core/platform/posix/telemetry.h | 145 +++++ .../core/platform/windows/telemetry.cc | 4 +- tools/ci_build/build.py | 6 +- tools/ci_build/build_args.py | 3 +- 14 files changed, 1091 insertions(+), 9 deletions(-) create mode 100644 cmake/onnxruntime_1ds_telemetry.cmake create mode 100644 onnxruntime/core/platform/posix/telemetry.cc create mode 100644 onnxruntime/core/platform/posix/telemetry.h diff --git a/ThirdPartyNotices.txt b/ThirdPartyNotices.txt index fbd9f9a95f601..5a193f3f16888 100644 --- a/ThirdPartyNotices.txt +++ b/ThirdPartyNotices.txt @@ -6119,3 +6119,225 @@ Redistribution and use in source and binary forms, with or without modification, 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Copyright (c) 2026 KleidiAi. + +Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +_____ + +microsoft/cpp_client_telemetry, https://github.com/microsoft/cpp_client_telemetry/ + +Apache License + +Copyright (c) Microsoft Corporation. All rights reserved. + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright 2026 Microsoft Corporation + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. \ No newline at end of file diff --git a/build.bat b/build.bat index d0c6cbcddd669..b05a4a0b28210 100644 --- a/build.bat +++ b/build.bat @@ -7,4 +7,4 @@ setlocal set PATH=C:\Program Files\Git\usr\bin;%PATH% rem Requires a Python install to be available in your PATH -python "%~dp0\tools\ci_build\build.py" --build_dir "%~dp0\build\Windows" %* +python "%~dp0\tools\ci_build\build.py" --build_dir "%~dp0\build\Windows" --use_telemetry %* diff --git a/build.sh b/build.sh index bf799ac8b7211..2778cb5c53ef3 100755 --- a/build.sh +++ b/build.sh @@ -18,4 +18,4 @@ elif [[ "$*" == *"--android"* ]]; then DIR_OS="Android" fi -python3 $DIR/tools/ci_build/build.py --build_dir $DIR/build/$DIR_OS "$@" +python3 $DIR/tools/ci_build/build.py --build_dir $DIR/build/$DIR_OS --use_telemetry "$@" diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt index f2009c8575b51..aa9717251beda 100644 --- a/cmake/CMakeLists.txt +++ b/cmake/CMakeLists.txt @@ -560,6 +560,9 @@ set(ONNXRUNTIME_INCLUDE_DIR ${REPO_ROOT}/include/onnxruntime) list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/external) include(external/onnxruntime_external_deps.cmake) +# 1DS telemetry integration for non-Windows platforms (must come after external deps) +include(onnxruntime_1ds_telemetry.cmake) + set(ORT_WARNING_FLAGS) if (WIN32) # class needs to have dll-interface to be used by clients diff --git a/cmake/deps.txt b/cmake/deps.txt index 578dd8fd23d09..ffe613412dfa8 100644 --- a/cmake/deps.txt +++ b/cmake/deps.txt @@ -61,3 +61,5 @@ kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.20.0.tar. # this entry will be updated to use refs/tags/ instead of the raw commit hash. kleidiai-qmx;https://github.com/qualcomm/kleidiai/archive/2f10c9a8d32f81ffeeb6d4885a29cc35d2b0da87.zip;5e855730a2d69057a569f43dd7532db3b2d2a05c duktape;https://github.com/svaarala/duktape/releases/download/v2.7.0/duktape-2.7.0.tar.xz;8200c8e417dbab7adcc12c4dbdef7651cfc55794 +# cpp_client_telemetry (1DS SDK) for cross-platform telemetry on non-Windows platforms +cpp_client_telemetry;https://github.com/microsoft/cpp_client_telemetry/archive/refs/tags/v3.10.40.1.zip;ee2ded25e539f64052c9d8635bef4ea62c30e014 diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index c1701af45d523..f440102f9a34b 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -7,8 +7,9 @@ include(external/helper_functions.cmake) file(STRINGS deps.txt ONNXRUNTIME_DEPS_LIST) foreach(ONNXRUNTIME_DEP IN LISTS ONNXRUNTIME_DEPS_LIST) - # Lines start with "#" are comments - if(NOT ONNXRUNTIME_DEP MATCHES "^#") + # Lines start with "#" are comments, so skip them. + # cpp_client_telemetry is only needed for telemetry on non-Windows platforms, so skip if telemetry is not enabled or it's Windows platform. + if((NOT ONNXRUNTIME_DEP MATCHES "^#") AND ((NOT ONNXRUNTIME_DEP MATCHES "^cpp_client_telemetry") OR (onnxruntime_USE_TELEMETRY AND NOT WIN32))) # The first column is name list(POP_FRONT ONNXRUNTIME_DEP ONNXRUNTIME_DEP_NAME) # The second column is URL @@ -874,6 +875,28 @@ if(onnxruntime_USE_SNPE) list(APPEND onnxruntime_EXTERNAL_LIBRARIES ${SNPE_NN_LIBS}) endif() +# 1DS SDK (cpp_client_telemetry) for cross-platform telemetry on non-Windows platforms +if(onnxruntime_USE_TELEMETRY AND NOT WIN32) + set(BUILD_UNIT_TESTS_SAVED "${BUILD_UNIT_TESTS}") + set(BUILD_FUNC_TESTS_SAVED "${BUILD_FUNC_TESTS}") + set(BUILD_SAMPLES_SAVED "${BUILD_SAMPLES}") + set(BUILD_UNIT_TESTS OFF CACHE BOOL "Disable 1DS SDK unit tests" FORCE) + set(BUILD_FUNC_TESTS OFF CACHE BOOL "Disable 1DS SDK functional tests" FORCE) + set(BUILD_SAMPLES OFF CACHE BOOL "Disable 1DS SDK samples" FORCE) + + onnxruntime_fetchcontent_declare( + cpp_client_telemetry + URL ${DEP_URL_cpp_client_telemetry} + URL_HASH SHA1=${DEP_SHA1_cpp_client_telemetry} + EXCLUDE_FROM_ALL + ) + onnxruntime_fetchcontent_makeavailable(cpp_client_telemetry) + + set(BUILD_UNIT_TESTS "${BUILD_UNIT_TESTS_SAVED}" CACHE BOOL "" FORCE) + set(BUILD_FUNC_TESTS "${BUILD_FUNC_TESTS_SAVED}" CACHE BOOL "" FORCE) + set(BUILD_SAMPLES "${BUILD_SAMPLES_SAVED}" CACHE BOOL "" FORCE) +endif() + FILE(TO_NATIVE_PATH ${CMAKE_BINARY_DIR} ORT_BINARY_DIR) FILE(TO_NATIVE_PATH ${PROJECT_SOURCE_DIR} ORT_SOURCE_DIR) diff --git a/cmake/onnxruntime_1ds_telemetry.cmake b/cmake/onnxruntime_1ds_telemetry.cmake new file mode 100644 index 0000000000000..20a73f2139b09 --- /dev/null +++ b/cmake/onnxruntime_1ds_telemetry.cmake @@ -0,0 +1,35 @@ +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +# This file handles telemetry integration for non-Windows platforms +# (macOS, Linux, Android, iOS) using the 1DS SDK (cpp_client_telemetry). +# The SDK is fetched via FetchContent in onnxruntime_external_deps.cmake. + +if(onnxruntime_USE_TELEMETRY AND NOT WIN32) + if(NOT TARGET mat) + message(FATAL_ERROR "Telemetry enabled for non-Windows but 'mat' target not found. " + "Ensure cpp_client_telemetry is fetched in onnxruntime_external_deps.cmake.") + endif() + + message(STATUS "Enabling 1DS telemetry for non-Windows platforms") + + # Add compile definition so C++ code can detect 1DS telemetry at compile time + add_compile_definitions(USE_1DS_TELEMETRY) + + # Platform-specific status messages + if(APPLE) + if(CMAKE_SYSTEM_NAME STREQUAL "iOS") + message(STATUS " Platform: iOS") + else() + message(STATUS " Platform: macOS") + endif() + elseif(ANDROID) + message(STATUS " Platform: Android") + elseif(UNIX) + message(STATUS " Platform: Linux") + endif() +else() + if(NOT onnxruntime_USE_TELEMETRY) + message(STATUS "Telemetry is disabled (use -Donnxruntime_USE_TELEMETRY=ON to enable)") + endif() +endif() diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index 0218994e537a0..b3630178b051b 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -55,6 +55,14 @@ else() "${ONNXRUNTIME_ROOT}/core/platform/posix/stacktrace.cc" ) + # Telemetry for non-Windows platforms (enabled by USE_TELEMETRY) + if (onnxruntime_USE_TELEMETRY) + list(APPEND onnxruntime_common_src_patterns + "${ONNXRUNTIME_ROOT}/core/platform/posix/telemetry.h" + "${ONNXRUNTIME_ROOT}/core/platform/posix/telemetry.cc" + ) + endif() + # logging files if (onnxruntime_USE_SYSLOG) list(APPEND onnxruntime_common_src_patterns @@ -138,7 +146,11 @@ if(NOT WIN32 AND NOT APPLE AND NOT ANDROID AND CMAKE_SYSTEM_PROCESSOR MATCHES "x endif() if (onnxruntime_USE_TELEMETRY) - set_target_properties(onnxruntime_common PROPERTIES COMPILE_FLAGS "/FI${ONNXRUNTIME_INCLUDE_DIR}/core/platform/windows/TraceLoggingConfigPrivate.h") + if(WIN32) + set_target_properties(onnxruntime_common PROPERTIES COMPILE_FLAGS "/FI${ONNXRUNTIME_INCLUDE_DIR}/core/platform/windows/TraceLoggingConfigPrivate.h") + else() + target_compile_definitions(onnxruntime_common PRIVATE USE_1DS_TELEMETRY) + endif() endif() if (onnxruntime_USE_MIMALLOC) list(APPEND onnxruntime_EXTERNAL_LIBRARIES mimalloc-static) @@ -200,6 +212,32 @@ if(CPUINFO_SUPPORTED) list(APPEND onnxruntime_EXTERNAL_LIBRARIES cpuinfo::cpuinfo) endif() +# Link telemetry library (1DS SDK) for non-Windows platforms +if(onnxruntime_USE_TELEMETRY AND NOT WIN32) + if(TARGET mat) + target_link_libraries(onnxruntime_common PRIVATE mat) + # Platform-specific system libraries required by the 1DS SDK + if(APPLE) + target_link_libraries(onnxruntime_common PRIVATE + "-framework CoreFoundation" + "-framework Security" + z + sqlite3 + ) + elseif(ANDROID) + target_link_libraries(onnxruntime_common PRIVATE z log) + elseif(UNIX) + target_link_libraries(onnxruntime_common PRIVATE + curl + z + sqlite3 + ) + endif() + else() + message(WARNING "Telemetry enabled but 'mat' library target not found") + endif() +endif() + if (NOT onnxruntime_BUILD_SHARED_LIB) install(DIRECTORY ${PROJECT_SOURCE_DIR}/../include/onnxruntime/core/common DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/onnxruntime/core) install(TARGETS onnxruntime_common EXPORT ${PROJECT_NAME}Targets diff --git a/onnxruntime/core/platform/posix/env.cc b/onnxruntime/core/platform/posix/env.cc index aeddef0c5188f..6efbe5acd3d37 100644 --- a/onnxruntime/core/platform/posix/env.cc +++ b/onnxruntime/core/platform/posix/env.cc @@ -16,6 +16,10 @@ limitations under the License. #include "core/platform/env.h" +#ifdef USE_1DS_TELEMETRY +#include "core/platform/posix/telemetry.h" +#endif + #include #include #include @@ -613,7 +617,11 @@ class PosixEnv : public Env { } private: +#ifdef USE_1DS_TELEMETRY + PosixTelemetry telemetry_provider_; +#else Telemetry telemetry_provider_; +#endif #ifdef ORT_USE_CPUINFO PosixEnv() { cpuinfo_available_ = cpuinfo_initialize(); diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc new file mode 100644 index 0000000000000..eb3a50fefaa6b --- /dev/null +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -0,0 +1,601 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#ifndef _WIN32 // Only for non-Windows platforms + +#include "core/platform/posix/telemetry.h" + +// 1DS SDK includes +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "core/common/logging/logging.h" +#include "core/common/status.h" +#include "onnxruntime_config.h" + +#ifdef __APPLE__ +#include +#endif + +using namespace Microsoft::Applications::Events; + +namespace onnxruntime { + +// Static member initialization +std::atomic PosixTelemetry::global_register_count_{0}; +std::mutex PosixTelemetry::global_mutex_; + +// Tenant token for 1DS telemetry ingestion +constexpr const char* TENANT_TOKEN = "5ad963bd4b3a4118a481401cc0211875-da8e8657-47d4-4ed7-ab39-7886e136f53b-6988"; + +// Event priority mapping (1DS priorities) +enum class EventPriority { + NORMAL = EventLatency_Normal, // Most events + HIGH = EventLatency_RealTime, // RuntimeError + CRITICAL = EventLatency_RealTime // ProcessInfo, SessionCreation +}; + +// Transmit profiles +constexpr const char* PROFILE_REAL_TIME = "REAL_TIME"; +constexpr const char* PROFILE_NEAR_REAL_TIME = "NEAR_REAL_TIME"; +constexpr const char* PROFILE_BEST_EFFORT = "BEST_EFFORT"; + +// Helper class to build events with common properties +class EventBuilder { + private: + EventProperties props_; + + public: + explicit EventBuilder(const char* event_name, EventPriority priority) + : props_(event_name) { + // Set latency/priority + props_.SetLatency(static_cast(priority)); + + // Set schema version for compatibility with Windows + props_.SetProperty("schemaVersion", static_cast(0)); + + // Privacy flags - no PII collection + props_.SetPIIKind(PiiKind_None); + } + + EventBuilder& AddString(const char* key, const std::string& value) { + if (!value.empty()) { + props_.SetProperty(key, value); + } + return *this; + } + + EventBuilder& AddInt32(const char* key, int32_t value) { + props_.SetProperty(key, static_cast(value)); + return *this; + } + + EventBuilder& AddInt64(const char* key, int64_t value) { + props_.SetProperty(key, value); + return *this; + } + + EventBuilder& AddBool(const char* key, bool value) { + props_.SetProperty(key, value); + return *this; + } + + EventBuilder& AddUInt32(const char* key, uint32_t value) { + props_.SetProperty(key, static_cast(value)); + return *this; + } + + EventBuilder& AddDouble(const char* key, double value) { + props_.SetProperty(key, value); + return *this; + } + + // Helper for vector to comma-separated string + EventBuilder& AddStringList(const char* key, const std::vector& vec) { + if (!vec.empty()) { + std::string result; + for (size_t i = 0; i < vec.size(); ++i) { + if (i > 0) result += ','; + result += vec[i]; + } + props_.SetProperty(key, result); + } + return *this; + } + + // Helper for map to key=value,key=value format + EventBuilder& AddIntMap(const char* key, const std::unordered_map& map) { + if (!map.empty()) { + std::string result; + bool first = true; + for (const auto& [k, v] : map) { + if (!first) result += ','; + result += k + '=' + std::to_string(v); + first = false; + } + props_.SetProperty(key, result); + } + return *this; + } + + // Helper for string map + EventBuilder& AddStringMap(const char* key, const std::unordered_map& map) { + if (!map.empty()) { + std::string result; + bool first = true; + for (const auto& [k, v] : map) { + if (!first) result += ','; + result += k + '=' + v; + first = false; + } + props_.SetProperty(key, result); + } + return *this; + } + + // Helper for batch size duration map + EventBuilder& AddBatchSizeDurations(const std::unordered_map& durations) { + for (const auto& [batch_size, duration] : durations) { + std::string key = "batchSize_" + std::to_string(batch_size); + props_.SetProperty(key, duration); + } + return *this; + } + + // Add common platform/device context + EventBuilder& AddCommonContext(const PosixTelemetry* telemetry) { + props_.SetProperty("platform", telemetry->GetPlatformInfo()); + props_.SetProperty("device", telemetry->GetDeviceInfo()); + props_.SetProperty("projection", static_cast(telemetry->projection_.load())); + return *this; + } + + EventProperties Build() { return std::move(props_); } +}; + +PosixTelemetry::PosixTelemetry() { + std::lock_guard lock(global_mutex_); + + if (global_register_count_ == 0) { + try { + Initialize(); + global_register_count_++; + } catch (const std::exception& ex) { + // Log error but don't fail construction + // Telemetry failures should not break application functionality + LOGS_DEFAULT(WARNING) << "Failed to initialize telemetry: " << ex.what(); + } + } +} + +PosixTelemetry::~PosixTelemetry() { + std::lock_guard lock(global_mutex_); + + if (global_register_count_ > 0) { + global_register_count_--; + if (global_register_count_ == 0) { + try { + Shutdown(); + } catch (const std::exception& ex) { + // Log error but don't throw from destructor + LOGS_DEFAULT(WARNING) << "Error during telemetry shutdown: " << ex.what(); + } + } + } +} + +// Safe async event logging with error handling +void PosixTelemetry::LogEventAsync(microsoft::applications::events::EventProperties&& props) const { + if (!enabled_ || !logger_) { + return; + } + + try { + // Use async LogEvent for non-blocking telemetry + logger_->LogEvent(std::move(props)); + } catch (const std::exception& ex) { + // Log telemetry failures to ORT logging system + LOGS_DEFAULT(WARNING) << "[Telemetry] Failed to log event: " << ex.what(); + } +} + +void PosixTelemetry::Initialize() { + std::lock_guard lock(mutex_); + + // Configure 1DS SDK for optimal async performance + LogConfiguration config; + config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; + config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging + config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode + config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown + + // Configure cache for offline scenarios + config[CFG_STR_CACHE_FILE_PATH] = "/tmp/onnxruntime_telemetry_cache"; + + // Configure RAM queue for async batching + config[CFG_INT_RAM_QUEUE_SIZE] = 512 * 1024; // 512KB RAM queue + config[CFG_INT_RAM_QUEUE_BUFFERS] = 3; // Triple buffering for smooth async operation + + // Sampling configuration (percentage: 100 = 100%, 10 = 10%) + // Sample 100% of critical events, 10% of routine events for performance + config[CFG_STR_SAMPLING_PERCENTAGE] = "ProcessInfo=100,SessionCreation=100,SessionCreationStart=100," + "RuntimeError=100,EvaluationStart=10,EvaluationStop=10," + "RuntimePerf=10,CompileModelStart=50,CompileModelComplete=50," + "EpAutoSelection=50,ProviderOptions=10"; + + // Create logger instance + logger_ = LogManager::Initialize(TENANT_TOKEN, config); + + if (logger_) { + // Set privacy level - no PII collection + logger_->SetContext("PrivacyLevel", "o:0"); + + // Set platform information as context + logger_->SetContext("Platform", GetPlatformInfo()); + logger_->SetContext("Device", GetDeviceInfo()); + + // Set application information + logger_->SetContext("AppName", "ONNXRuntime"); + logger_->SetContext("AppVersion", ORT_VERSION); + + enabled_ = true; + } +} + +void PosixTelemetry::Shutdown() { + std::lock_guard lock(mutex_); + + if (logger_) { + // According to cpp_client_telemetry use-after-free docs: + // 1. Stop using ILogger before calling FlushAndTeardown + // 2. Reset shared_ptr to release reference before teardown + // 3. Call FlushAndTeardown only once when count reaches zero + + // Disable logging first to prevent new events + enabled_ = false; + + // Release our reference to the logger + logger_.reset(); + + // Now safely call FlushAndTeardown + // This will block until all pending events are sent or timeout + LogManager::FlushAndTeardown(); + } +} + +std::string PosixTelemetry::GetPlatformInfo() const { + struct utsname system_info; + if (uname(&system_info) == 0) { + std::ostringstream oss; + oss << system_info.sysname << " " << system_info.release; + return oss.str(); + } + return "Unknown"; +} + +std::string PosixTelemetry::GetDeviceInfo() const { +#ifdef __APPLE__ + #if TARGET_OS_IOS + return "iOS"; + #elif TARGET_OS_MAC + return "macOS"; + #endif +#elif defined(__ANDROID__) + return "Android"; +#elif defined(__linux__) + return "Linux"; +#else + return "Unknown"; +#endif +} + +void PosixTelemetry::EnableTelemetryEvents() const { + enabled_ = true; +} + +void PosixTelemetry::DisableTelemetryEvents() const { + enabled_ = false; +} + +void PosixTelemetry::SetLanguageProjection(uint32_t projection) const { + projection_ = projection; +} + +bool PosixTelemetry::IsEnabled() const { + return enabled_; +} + +unsigned char PosixTelemetry::Level() const { + return level_; +} + +uint64_t PosixTelemetry::Keyword() const { + return keyword_; +} + +void PosixTelemetry::LogProcessInfo() const { + if (!enabled_ || !logger_) { + return; + } + + // Log process info only once + if (process_info_logged_.exchange(true)) { + return; + } + + auto event = EventBuilder("ProcessInfo", EventPriority::CRITICAL) + .AddCommonContext(this) + .AddString("runtimeVersion", ORT_VERSION) + .AddInt32("processId", static_cast(getpid())) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("SessionCreationStart", EventPriority::CRITICAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogEvaluationStop(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("EvaluationStop", EventPriority::NORMAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogEvaluationStart(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("EvaluationStart", EventPriority::NORMAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogSessionCreation( + uint32_t session_id, int64_t ir_version, + const std::string& model_producer_name, + const std::string& model_producer_version, + const std::string& model_domain, + const std::unordered_map& domain_to_version_map, + const std::string& model_file_name, + const std::string& model_graph_name, + const std::string& model_weight_type, + const std::string& model_graph_hash, + const std::string& model_weight_hash, + const std::unordered_map& model_metadata, + const std::string& loadedFrom, + const std::vector& execution_provider_ids, + bool use_fp16, bool captureState) const { + if (!enabled_ || !logger_) { + return; + } + + const char* event_name = captureState ? "SessionCreation_CaptureState" : "SessionCreation"; + + auto builder = EventBuilder(event_name, EventPriority::CRITICAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt64("irVersion", ir_version) + .AddString("modelProducerName", model_producer_name) + .AddString("modelProducerVersion", model_producer_version) + .AddString("modelDomain", model_domain) + .AddIntMap("domainToVersionMap", domain_to_version_map) + .AddString("modelFileName", model_file_name) + .AddString("modelGraphName", model_graph_name) + .AddString("modelWeightType", model_weight_type) + .AddString("modelGraphHash", model_graph_hash) + .AddString("modelWeightHash", model_weight_hash) + .AddStringMap("modelMetadata", model_metadata) + .AddString("loadedFrom", loadedFrom) + .AddStringList("executionProviderIds", execution_provider_ids) + .AddBool("useFp16", use_fp16); + + LogEventAsync(builder.Build()); +} + +void PosixTelemetry::LogCompileModelStart( + uint32_t session_id, + const std::string& input_source, + const std::string& output_target, + uint32_t flags, + int graph_optimization_level, + bool embed_ep_context, + bool has_external_initializers_file, + const std::vector& execution_provider_ids) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("CompileModelStart", EventPriority::NORMAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddString("inputSource", input_source) + .AddString("outputTarget", output_target) + .AddUInt32("flags", flags) + .AddInt32("graphOptimizationLevel", graph_optimization_level) + .AddBool("embedEpContext", embed_ep_context) + .AddBool("hasExternalInitializersFile", has_external_initializers_file) + .AddStringList("executionProviderIds", execution_provider_ids) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogCompileModelComplete( + uint32_t session_id, + bool success, + uint32_t error_code, + uint32_t error_category, + const std::string& error_message) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("CompileModelComplete", EventPriority::NORMAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddBool("success", success) + .AddUInt32("errorCode", error_code) + .AddUInt32("errorCategory", error_category) + .AddString("errorMessage", error_message) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogRuntimeError( + uint32_t session_id, const common::Status& status, + const char* file, const char* function, uint32_t line) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("RuntimeError", EventPriority::HIGH) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt32("errorCode", static_cast(status.Code())) + .AddInt32("errorCategory", static_cast(status.Category())) + .AddString("errorMessage", status.ErrorMessage()) + .AddString("file", file ? file : "") + .AddString("function", function ? function : "") + .AddUInt32("line", line) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogRuntimePerf( + uint32_t session_id, uint32_t total_runs_since_last, + int64_t total_run_duration_since_last, + std::unordered_map duration_per_batch_size) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("RuntimePerf", EventPriority::NORMAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddUInt32("totalRunsSinceLast", total_runs_since_last) + .AddInt64("totalRunDurationSinceLast", total_run_duration_since_last) + .AddBatchSizeDurations(duration_per_batch_size) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogExecutionProviderEvent(LUID* adapterLuid) const { + // Not applicable for non-Windows platforms (LUID is Windows-specific) + (void)adapterLuid; +} + +void PosixTelemetry::LogDriverInfoEvent( + const std::string_view device_class, + const std::wstring_view& driver_names, + const std::wstring_view& driver_versions) const { + // Not applicable for non-Windows platforms + (void)device_class; + (void)driver_names; + (void)driver_versions; +} + +void PosixTelemetry::LogAutoEpSelection( + uint32_t session_id, const std::string& selection_policy, + const std::vector& requested_execution_provider_ids, + const std::vector& available_execution_provider_ids) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("EpAutoSelection", EventPriority::NORMAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddString("selectionPolicy", selection_policy) + .AddStringList("requestedExecutionProviderIds", requested_execution_provider_ids) + .AddStringList("availableExecutionProviderIds", available_execution_provider_ids) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogProviderOptions( + const std::string& provider_id, + const std::string& provider_options_string, + bool captureState) const { + if (!enabled_ || !logger_) { + return; + } + + const char* event_name = captureState ? "ProviderOptions_CaptureState" : "ProviderOptions"; + + auto event = EventBuilder(event_name, EventPriority::NORMAL) + .AddCommonContext(this) + .AddString("providerId", provider_id) + .AddString("providerOptions", provider_options_string) + .Build(); + + LogEventAsync(std::move(event)); +} + +// Posix-specific: Log system resource metrics +void PosixTelemetry::LogPosixSystemMetrics(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + struct rusage usage; + if (getrusage(RUSAGE_SELF, &usage) == 0) { + // Note: ru_maxrss is in KB on Linux, bytes on macOS +#ifdef __APPLE__ + int64_t max_rss_kb = usage.ru_maxrss / 1024; +#else + int64_t max_rss_kb = usage.ru_maxrss; +#endif + + auto event = EventBuilder("PosixSystemMetrics", EventPriority::NORMAL) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt64("maxRssKb", max_rss_kb) + .AddInt64("userCpuTimeSec", usage.ru_utime.tv_sec) + .AddInt64("userCpuTimeUsec", usage.ru_utime.tv_usec) + .AddInt64("systemCpuTimeSec", usage.ru_stime.tv_sec) + .AddInt64("systemCpuTimeUsec", usage.ru_stime.tv_usec) + .AddInt64("minorPageFaults", usage.ru_minflt) + .AddInt64("majorPageFaults", usage.ru_majflt) + .AddInt64("voluntaryContextSwitches", usage.ru_nvcsw) + .AddInt64("involuntaryContextSwitches", usage.ru_nivcsw) + .Build(); + + LogEventAsync(std::move(event)); + } +} + +} // namespace onnxruntime + +#endif // !_WIN32 diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h new file mode 100644 index 0000000000000..afd7a6c12b259 --- /dev/null +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -0,0 +1,145 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#pragma once + +#ifndef _WIN32 // Only for non-Windows platforms + +#include "core/platform/telemetry.h" +#include +#include +#include +#include +#include + +// Forward declarations of 1DS SDK types (must be at global scope) +namespace Microsoft::Applications::Events { +class ILogger; +class ISemanticContext; +class EventProperties; +} // namespace Microsoft::Applications::Events + +namespace onnxruntime { + +/** +* @brief Telemetry implementation for non-Windows platforms. +* +* This class provides telemetry logging capabilities for macOS, Linux, Android, and iOS +* using the cpp_client_telemetry library (1DS SDK). It implements the same interface +* as WindowsTelemetry to provide consistent telemetry across all platforms. +* +* Configuration: +* - Telemetry is opt-in via build flags +*/ +class PosixTelemetry : public Telemetry { + public: + PosixTelemetry(); + ~PosixTelemetry() override; + + void EnableTelemetryEvents() const override; + void DisableTelemetryEvents() const override; + void SetLanguageProjection(uint32_t projection) const override; + + bool IsEnabled() const override; + unsigned char Level() const override; + uint64_t Keyword() const override; + + void LogProcessInfo() const override; + void LogSessionCreationStart(uint32_t session_id) const override; + void LogEvaluationStop(uint32_t session_id) const override; + void LogEvaluationStart(uint32_t session_id) const override; + + void LogSessionCreation(uint32_t session_id, int64_t ir_version, + const std::string& model_producer_name, + const std::string& model_producer_version, + const std::string& model_domain, + const std::unordered_map& domain_to_version_map, + const std::string& model_file_name, + const std::string& model_graph_name, + const std::string& model_weight_type, + const std::string& model_graph_hash, + const std::string& model_weight_hash, + const std::unordered_map& model_metadata, + const std::string& loadedFrom, + const std::vector& execution_provider_ids, + bool use_fp16, bool captureState) const override; + + void LogCompileModelStart(uint32_t session_id, + const std::string& input_source, + const std::string& output_target, + uint32_t flags, + int graph_optimization_level, + bool embed_ep_context, + bool has_external_initializers_file, + const std::vector& execution_provider_ids) const override; + + void LogCompileModelComplete(uint32_t session_id, + bool success, + uint32_t error_code, + uint32_t error_category, + const std::string& error_message) const override; + + void LogRuntimeError(uint32_t session_id, const common::Status& status, + const char* file, const char* function, uint32_t line) const override; + + void LogRuntimePerf(uint32_t session_id, uint32_t total_runs_since_last, + int64_t total_run_duration_since_last, + std::unordered_map duration_per_batch_size) const override; + + void LogExecutionProviderEvent(LUID* adapterLuid) const override; + void LogDriverInfoEvent(const std::string_view device_class, + const std::wstring_view& driver_names, + const std::wstring_view& driver_versions) const override; + + void LogAutoEpSelection(uint32_t session_id, const std::string& selection_policy, + const std::vector& requested_execution_provider_ids, + const std::vector& available_execution_provider_ids) const override; + + void LogProviderOptions(const std::string& provider_id, + const std::string& provider_options_string, + bool captureState) const override; + + private: + // Initialize telemetry SDK logger + void Initialize(); + + // Shutdown telemetry SDK logger + void Shutdown(); + + // Helper to get platform-specific information + std::string GetPlatformInfo() const; + std::string GetDeviceInfo() const; + + // Safe async event logging + void LogEventAsync(::Microsoft::Applications::Events::EventProperties&& props) const; + + // Posix-specific: Log system resource metrics + void LogPosixSystemMetrics(uint32_t session_id) const; + + // Mutex for thread-safe access + mutable std::mutex mutex_; + + // Telemetry SDK logger instance (1DS) + std::shared_ptr<::Microsoft::Applications::Events::ILogger> logger_; + + // State tracking + mutable std::atomic enabled_{true}; + mutable std::atomic projection_{0}; + mutable std::atomic level_{0}; + mutable std::atomic keyword_{0}; + + // Process info tracking + mutable std::atomic process_info_logged_{false}; + + // Global registration count for singleton behavior + static std::atomic global_register_count_; + static std::mutex global_mutex_; + + // Make EventBuilder a friend so it can access GetPlatformInfo/GetDeviceInfo + friend class EventBuilder; +}; + +} // namespace onnxruntime + +#endif // !_WIN32 + diff --git a/onnxruntime/core/platform/windows/telemetry.cc b/onnxruntime/core/platform/windows/telemetry.cc index 6d5a400be703b..d4ded4b25774b 100644 --- a/onnxruntime/core/platform/windows/telemetry.cc +++ b/onnxruntime/core/platform/windows/telemetry.cc @@ -2,11 +2,11 @@ // Licensed under the MIT License. #include "core/platform/windows/telemetry.h" +#include +#include #include #include #include -#include -#include #include "core/common/logging/logging.h" #include "onnxruntime_config.h" diff --git a/tools/ci_build/build.py b/tools/ci_build/build.py index a0712af35e455..37aac6c199325 100644 --- a/tools/ci_build/build.py +++ b/tools/ci_build/build.py @@ -375,11 +375,15 @@ def generate_build_tree( disable_float4_types = args.android or ("float4" in types_to_disable) disable_optional_type = "optional" in types_to_disable disable_sparse_tensors = "sparsetensor" in types_to_disable + # Telemetry: On Windows uses ETW, on non-Windows uses 1DS + cmake_args += [ + "-Donnxruntime_USE_TELEMETRY=" + ("ON" if args.use_telemetry else "OFF"), + ] + if is_windows(): cmake_args += [ "-Donnxruntime_USE_DML=" + ("ON" if args.use_dml else "OFF"), "-Donnxruntime_USE_WINML=" + ("ON" if args.use_winml else "OFF"), - "-Donnxruntime_USE_TELEMETRY=" + ("ON" if args.use_telemetry else "OFF"), "-Donnxruntime_ENABLE_PIX_FOR_WEBGPU_EP=" + ("ON" if args.enable_pix_capture else "OFF"), ] diff --git a/tools/ci_build/build_args.py b/tools/ci_build/build_args.py index f32666f65cc38..0e1870ca0d316 100644 --- a/tools/ci_build/build_args.py +++ b/tools/ci_build/build_args.py @@ -429,7 +429,6 @@ def add_windows_specific_args(parser: argparse.ArgumentParser) -> None: parser.add_argument("--msvc_toolset", help="MSVC toolset version (e.g., 14.11). Must be >=14.40") parser.add_argument("--windows_sdk_version", help="Windows SDK version (e.g., 10.0.19041.0).") parser.add_argument("--enable_msvc_static_runtime", action="store_true", help="Statically link MSVC runtimes.") - parser.add_argument("--use_telemetry", action="store_true", help="Enable telemetry (official builds only).") parser.add_argument("--caller_framework", type=str, help="Name of the framework calling ONNX Runtime.") # Cross-compilation targets hosted on Windows @@ -841,6 +840,8 @@ def add_other_feature_args(parser: argparse.ArgumentParser) -> None: action="store_true", help="Build ORT shared lib with compatible bridge for primary EPs (TRT, OV, QNN, VitisAI), excludes tests.", ) + # Telemetry arguments (cross-platform) + parser.add_argument("--use_telemetry", action="store_true", help="Enable telemetry (ETW on Windows, 1DS on other platforms).") def is_cross_compiling(args: argparse.Namespace) -> bool: From 1dbcaf5d15663f621f41d8aa963fd74f4a9fa0f1 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 19 Feb 2026 10:04:00 -0600 Subject: [PATCH 02/61] Lint --- onnxruntime/core/platform/posix/telemetry.cc | 289 ++++++++++--------- onnxruntime/core/platform/posix/telemetry.h | 67 +++-- tools/ci_build/build.py | 2 +- tools/ci_build/build_args.py | 4 +- 4 files changed, 182 insertions(+), 180 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index eb3a50fefaa6b..82f561df0f48e 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -37,9 +37,9 @@ constexpr const char* TENANT_TOKEN = "5ad963bd4b3a4118a481401cc0211875-da8e8657- // Event priority mapping (1DS priorities) enum class EventPriority { - NORMAL = EventLatency_Normal, // Most events - HIGH = EventLatency_RealTime, // RuntimeError - CRITICAL = EventLatency_RealTime // ProcessInfo, SessionCreation + NORMAL = EventLatency_Normal, // Most events + HIGH = EventLatency_RealTime, // RuntimeError + CRITICAL = EventLatency_RealTime // ProcessInfo, SessionCreation }; // Transmit profiles @@ -51,52 +51,52 @@ constexpr const char* PROFILE_BEST_EFFORT = "BEST_EFFORT"; class EventBuilder { private: EventProperties props_; - + public: - explicit EventBuilder(const char* event_name, EventPriority priority) + explicit EventBuilder(const char* event_name, EventPriority priority) : props_(event_name) { // Set latency/priority props_.SetLatency(static_cast(priority)); - + // Set schema version for compatibility with Windows props_.SetProperty("schemaVersion", static_cast(0)); - + // Privacy flags - no PII collection props_.SetPIIKind(PiiKind_None); } - + EventBuilder& AddString(const char* key, const std::string& value) { if (!value.empty()) { props_.SetProperty(key, value); } return *this; } - + EventBuilder& AddInt32(const char* key, int32_t value) { props_.SetProperty(key, static_cast(value)); return *this; } - + EventBuilder& AddInt64(const char* key, int64_t value) { props_.SetProperty(key, value); return *this; } - + EventBuilder& AddBool(const char* key, bool value) { props_.SetProperty(key, value); return *this; } - + EventBuilder& AddUInt32(const char* key, uint32_t value) { props_.SetProperty(key, static_cast(value)); return *this; } - + EventBuilder& AddDouble(const char* key, double value) { props_.SetProperty(key, value); return *this; } - + // Helper for vector to comma-separated string EventBuilder& AddStringList(const char* key, const std::vector& vec) { if (!vec.empty()) { @@ -109,7 +109,7 @@ class EventBuilder { } return *this; } - + // Helper for map to key=value,key=value format EventBuilder& AddIntMap(const char* key, const std::unordered_map& map) { if (!map.empty()) { @@ -124,7 +124,7 @@ class EventBuilder { } return *this; } - + // Helper for string map EventBuilder& AddStringMap(const char* key, const std::unordered_map& map) { if (!map.empty()) { @@ -139,7 +139,7 @@ class EventBuilder { } return *this; } - + // Helper for batch size duration map EventBuilder& AddBatchSizeDurations(const std::unordered_map& durations) { for (const auto& [batch_size, duration] : durations) { @@ -148,7 +148,7 @@ class EventBuilder { } return *this; } - + // Add common platform/device context EventBuilder& AddCommonContext(const PosixTelemetry* telemetry) { props_.SetProperty("platform", telemetry->GetPlatformInfo()); @@ -156,13 +156,13 @@ class EventBuilder { props_.SetProperty("projection", static_cast(telemetry->projection_.load())); return *this; } - + EventProperties Build() { return std::move(props_); } }; PosixTelemetry::PosixTelemetry() { std::lock_guard lock(global_mutex_); - + if (global_register_count_ == 0) { try { Initialize(); @@ -177,7 +177,7 @@ PosixTelemetry::PosixTelemetry() { PosixTelemetry::~PosixTelemetry() { std::lock_guard lock(global_mutex_); - + if (global_register_count_ > 0) { global_register_count_--; if (global_register_count_ == 0) { @@ -196,7 +196,7 @@ void PosixTelemetry::LogEventAsync(microsoft::applications::events::EventPropert if (!enabled_ || !logger_) { return; } - + try { // Use async LogEvent for non-blocking telemetry logger_->LogEvent(std::move(props)); @@ -208,62 +208,63 @@ void PosixTelemetry::LogEventAsync(microsoft::applications::events::EventPropert void PosixTelemetry::Initialize() { std::lock_guard lock(mutex_); - + // Configure 1DS SDK for optimal async performance LogConfiguration config; config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; - config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging + config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode - config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown - + config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown + // Configure cache for offline scenarios config[CFG_STR_CACHE_FILE_PATH] = "/tmp/onnxruntime_telemetry_cache"; - + // Configure RAM queue for async batching config[CFG_INT_RAM_QUEUE_SIZE] = 512 * 1024; // 512KB RAM queue - config[CFG_INT_RAM_QUEUE_BUFFERS] = 3; // Triple buffering for smooth async operation - + config[CFG_INT_RAM_QUEUE_BUFFERS] = 3; // Triple buffering for smooth async operation + // Sampling configuration (percentage: 100 = 100%, 10 = 10%) // Sample 100% of critical events, 10% of routine events for performance - config[CFG_STR_SAMPLING_PERCENTAGE] = "ProcessInfo=100,SessionCreation=100,SessionCreationStart=100," - "RuntimeError=100,EvaluationStart=10,EvaluationStop=10," - "RuntimePerf=10,CompileModelStart=50,CompileModelComplete=50," - "EpAutoSelection=50,ProviderOptions=10"; - + config[CFG_STR_SAMPLING_PERCENTAGE] = + "ProcessInfo=100,SessionCreation=100,SessionCreationStart=100," + "RuntimeError=100,EvaluationStart=10,EvaluationStop=10," + "RuntimePerf=10,CompileModelStart=50,CompileModelComplete=50," + "EpAutoSelection=50,ProviderOptions=10"; + // Create logger instance logger_ = LogManager::Initialize(TENANT_TOKEN, config); if (logger_) { // Set privacy level - no PII collection logger_->SetContext("PrivacyLevel", "o:0"); - + // Set platform information as context logger_->SetContext("Platform", GetPlatformInfo()); logger_->SetContext("Device", GetDeviceInfo()); - + // Set application information logger_->SetContext("AppName", "ONNXRuntime"); logger_->SetContext("AppVersion", ORT_VERSION); - + enabled_ = true; } } void PosixTelemetry::Shutdown() { std::lock_guard lock(mutex_); - + if (logger_) { // According to cpp_client_telemetry use-after-free docs: // 1. Stop using ILogger before calling FlushAndTeardown // 2. Reset shared_ptr to release reference before teardown // 3. Call FlushAndTeardown only once when count reaches zero - + // Disable logging first to prevent new events enabled_ = false; - + // Release our reference to the logger logger_.reset(); - + // Now safely call FlushAndTeardown // This will block until all pending events are sent or timeout LogManager::FlushAndTeardown(); @@ -282,11 +283,11 @@ std::string PosixTelemetry::GetPlatformInfo() const { std::string PosixTelemetry::GetDeviceInfo() const { #ifdef __APPLE__ - #if TARGET_OS_IOS - return "iOS"; - #elif TARGET_OS_MAC - return "macOS"; - #endif +#if TARGET_OS_IOS + return "iOS"; +#elif TARGET_OS_MAC + return "macOS"; +#endif #elif defined(__ANDROID__) return "Android"; #elif defined(__linux__) @@ -331,11 +332,11 @@ void PosixTelemetry::LogProcessInfo() const { } auto event = EventBuilder("ProcessInfo", EventPriority::CRITICAL) - .AddCommonContext(this) - .AddString("runtimeVersion", ORT_VERSION) - .AddInt32("processId", static_cast(getpid())) - .Build(); - + .AddCommonContext(this) + .AddString("runtimeVersion", ORT_VERSION) + .AddInt32("processId", static_cast(getpid())) + .Build(); + LogEventAsync(std::move(event)); } @@ -345,10 +346,10 @@ void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { } auto event = EventBuilder("SessionCreationStart", EventPriority::CRITICAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + LogEventAsync(std::move(event)); } @@ -358,10 +359,10 @@ void PosixTelemetry::LogEvaluationStop(uint32_t session_id) const { } auto event = EventBuilder("EvaluationStop", EventPriority::NORMAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + LogEventAsync(std::move(event)); } @@ -371,10 +372,10 @@ void PosixTelemetry::LogEvaluationStart(uint32_t session_id) const { } auto event = EventBuilder("EvaluationStart", EventPriority::NORMAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + LogEventAsync(std::move(event)); } @@ -398,25 +399,25 @@ void PosixTelemetry::LogSessionCreation( } const char* event_name = captureState ? "SessionCreation_CaptureState" : "SessionCreation"; - + auto builder = EventBuilder(event_name, EventPriority::CRITICAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .AddInt64("irVersion", ir_version) - .AddString("modelProducerName", model_producer_name) - .AddString("modelProducerVersion", model_producer_version) - .AddString("modelDomain", model_domain) - .AddIntMap("domainToVersionMap", domain_to_version_map) - .AddString("modelFileName", model_file_name) - .AddString("modelGraphName", model_graph_name) - .AddString("modelWeightType", model_weight_type) - .AddString("modelGraphHash", model_graph_hash) - .AddString("modelWeightHash", model_weight_hash) - .AddStringMap("modelMetadata", model_metadata) - .AddString("loadedFrom", loadedFrom) - .AddStringList("executionProviderIds", execution_provider_ids) - .AddBool("useFp16", use_fp16); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt64("irVersion", ir_version) + .AddString("modelProducerName", model_producer_name) + .AddString("modelProducerVersion", model_producer_version) + .AddString("modelDomain", model_domain) + .AddIntMap("domainToVersionMap", domain_to_version_map) + .AddString("modelFileName", model_file_name) + .AddString("modelGraphName", model_graph_name) + .AddString("modelWeightType", model_weight_type) + .AddString("modelGraphHash", model_graph_hash) + .AddString("modelWeightHash", model_weight_hash) + .AddStringMap("modelMetadata", model_metadata) + .AddString("loadedFrom", loadedFrom) + .AddStringList("executionProviderIds", execution_provider_ids) + .AddBool("useFp16", use_fp16); + LogEventAsync(builder.Build()); } @@ -434,17 +435,17 @@ void PosixTelemetry::LogCompileModelStart( } auto event = EventBuilder("CompileModelStart", EventPriority::NORMAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .AddString("inputSource", input_source) - .AddString("outputTarget", output_target) - .AddUInt32("flags", flags) - .AddInt32("graphOptimizationLevel", graph_optimization_level) - .AddBool("embedEpContext", embed_ep_context) - .AddBool("hasExternalInitializersFile", has_external_initializers_file) - .AddStringList("executionProviderIds", execution_provider_ids) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddString("inputSource", input_source) + .AddString("outputTarget", output_target) + .AddUInt32("flags", flags) + .AddInt32("graphOptimizationLevel", graph_optimization_level) + .AddBool("embedEpContext", embed_ep_context) + .AddBool("hasExternalInitializersFile", has_external_initializers_file) + .AddStringList("executionProviderIds", execution_provider_ids) + .Build(); + LogEventAsync(std::move(event)); } @@ -459,14 +460,14 @@ void PosixTelemetry::LogCompileModelComplete( } auto event = EventBuilder("CompileModelComplete", EventPriority::NORMAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .AddBool("success", success) - .AddUInt32("errorCode", error_code) - .AddUInt32("errorCategory", error_category) - .AddString("errorMessage", error_message) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddBool("success", success) + .AddUInt32("errorCode", error_code) + .AddUInt32("errorCategory", error_category) + .AddString("errorMessage", error_message) + .Build(); + LogEventAsync(std::move(event)); } @@ -478,16 +479,16 @@ void PosixTelemetry::LogRuntimeError( } auto event = EventBuilder("RuntimeError", EventPriority::HIGH) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .AddInt32("errorCode", static_cast(status.Code())) - .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", status.ErrorMessage()) - .AddString("file", file ? file : "") - .AddString("function", function ? function : "") - .AddUInt32("line", line) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt32("errorCode", static_cast(status.Code())) + .AddInt32("errorCategory", static_cast(status.Category())) + .AddString("errorMessage", status.ErrorMessage()) + .AddString("file", file ? file : "") + .AddString("function", function ? function : "") + .AddUInt32("line", line) + .Build(); + LogEventAsync(std::move(event)); } @@ -500,13 +501,13 @@ void PosixTelemetry::LogRuntimePerf( } auto event = EventBuilder("RuntimePerf", EventPriority::NORMAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .AddUInt32("totalRunsSinceLast", total_runs_since_last) - .AddInt64("totalRunDurationSinceLast", total_run_duration_since_last) - .AddBatchSizeDurations(duration_per_batch_size) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddUInt32("totalRunsSinceLast", total_runs_since_last) + .AddInt64("totalRunDurationSinceLast", total_run_duration_since_last) + .AddBatchSizeDurations(duration_per_batch_size) + .Build(); + LogEventAsync(std::move(event)); } @@ -534,13 +535,13 @@ void PosixTelemetry::LogAutoEpSelection( } auto event = EventBuilder("EpAutoSelection", EventPriority::NORMAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .AddString("selectionPolicy", selection_policy) - .AddStringList("requestedExecutionProviderIds", requested_execution_provider_ids) - .AddStringList("availableExecutionProviderIds", available_execution_provider_ids) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddString("selectionPolicy", selection_policy) + .AddStringList("requestedExecutionProviderIds", requested_execution_provider_ids) + .AddStringList("availableExecutionProviderIds", available_execution_provider_ids) + .Build(); + LogEventAsync(std::move(event)); } @@ -553,13 +554,13 @@ void PosixTelemetry::LogProviderOptions( } const char* event_name = captureState ? "ProviderOptions_CaptureState" : "ProviderOptions"; - + auto event = EventBuilder(event_name, EventPriority::NORMAL) - .AddCommonContext(this) - .AddString("providerId", provider_id) - .AddString("providerOptions", provider_options_string) - .Build(); - + .AddCommonContext(this) + .AddString("providerId", provider_id) + .AddString("providerOptions", provider_options_string) + .Build(); + LogEventAsync(std::move(event)); } @@ -577,21 +578,21 @@ void PosixTelemetry::LogPosixSystemMetrics(uint32_t session_id) const { #else int64_t max_rss_kb = usage.ru_maxrss; #endif - + auto event = EventBuilder("PosixSystemMetrics", EventPriority::NORMAL) - .AddCommonContext(this) - .AddUInt32("sessionId", session_id) - .AddInt64("maxRssKb", max_rss_kb) - .AddInt64("userCpuTimeSec", usage.ru_utime.tv_sec) - .AddInt64("userCpuTimeUsec", usage.ru_utime.tv_usec) - .AddInt64("systemCpuTimeSec", usage.ru_stime.tv_sec) - .AddInt64("systemCpuTimeUsec", usage.ru_stime.tv_usec) - .AddInt64("minorPageFaults", usage.ru_minflt) - .AddInt64("majorPageFaults", usage.ru_majflt) - .AddInt64("voluntaryContextSwitches", usage.ru_nvcsw) - .AddInt64("involuntaryContextSwitches", usage.ru_nivcsw) - .Build(); - + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt64("maxRssKb", max_rss_kb) + .AddInt64("userCpuTimeSec", usage.ru_utime.tv_sec) + .AddInt64("userCpuTimeUsec", usage.ru_utime.tv_usec) + .AddInt64("systemCpuTimeSec", usage.ru_stime.tv_sec) + .AddInt64("systemCpuTimeUsec", usage.ru_stime.tv_usec) + .AddInt64("minorPageFaults", usage.ru_minflt) + .AddInt64("majorPageFaults", usage.ru_majflt) + .AddInt64("voluntaryContextSwitches", usage.ru_nvcsw) + .AddInt64("involuntaryContextSwitches", usage.ru_nivcsw) + .Build(); + LogEventAsync(std::move(event)); } } diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index afd7a6c12b259..615f65f836f6f 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -50,28 +50,28 @@ class PosixTelemetry : public Telemetry { void LogEvaluationStart(uint32_t session_id) const override; void LogSessionCreation(uint32_t session_id, int64_t ir_version, - const std::string& model_producer_name, - const std::string& model_producer_version, - const std::string& model_domain, - const std::unordered_map& domain_to_version_map, - const std::string& model_file_name, - const std::string& model_graph_name, - const std::string& model_weight_type, - const std::string& model_graph_hash, - const std::string& model_weight_hash, - const std::unordered_map& model_metadata, - const std::string& loadedFrom, - const std::vector& execution_provider_ids, - bool use_fp16, bool captureState) const override; + const std::string& model_producer_name, + const std::string& model_producer_version, + const std::string& model_domain, + const std::unordered_map& domain_to_version_map, + const std::string& model_file_name, + const std::string& model_graph_name, + const std::string& model_weight_type, + const std::string& model_graph_hash, + const std::string& model_weight_hash, + const std::unordered_map& model_metadata, + const std::string& loadedFrom, + const std::vector& execution_provider_ids, + bool use_fp16, bool captureState) const override; void LogCompileModelStart(uint32_t session_id, - const std::string& input_source, - const std::string& output_target, - uint32_t flags, - int graph_optimization_level, - bool embed_ep_context, - bool has_external_initializers_file, - const std::vector& execution_provider_ids) const override; + const std::string& input_source, + const std::string& output_target, + uint32_t flags, + int graph_optimization_level, + bool embed_ep_context, + bool has_external_initializers_file, + const std::vector& execution_provider_ids) const override; void LogCompileModelComplete(uint32_t session_id, bool success, @@ -80,39 +80,39 @@ class PosixTelemetry : public Telemetry { const std::string& error_message) const override; void LogRuntimeError(uint32_t session_id, const common::Status& status, - const char* file, const char* function, uint32_t line) const override; + const char* file, const char* function, uint32_t line) const override; void LogRuntimePerf(uint32_t session_id, uint32_t total_runs_since_last, - int64_t total_run_duration_since_last, - std::unordered_map duration_per_batch_size) const override; + int64_t total_run_duration_since_last, + std::unordered_map duration_per_batch_size) const override; void LogExecutionProviderEvent(LUID* adapterLuid) const override; void LogDriverInfoEvent(const std::string_view device_class, - const std::wstring_view& driver_names, - const std::wstring_view& driver_versions) const override; + const std::wstring_view& driver_names, + const std::wstring_view& driver_versions) const override; void LogAutoEpSelection(uint32_t session_id, const std::string& selection_policy, - const std::vector& requested_execution_provider_ids, - const std::vector& available_execution_provider_ids) const override; + const std::vector& requested_execution_provider_ids, + const std::vector& available_execution_provider_ids) const override; void LogProviderOptions(const std::string& provider_id, - const std::string& provider_options_string, - bool captureState) const override; + const std::string& provider_options_string, + bool captureState) const override; private: // Initialize telemetry SDK logger void Initialize(); - + // Shutdown telemetry SDK logger void Shutdown(); // Helper to get platform-specific information std::string GetPlatformInfo() const; std::string GetDeviceInfo() const; - + // Safe async event logging void LogEventAsync(::Microsoft::Applications::Events::EventProperties&& props) const; - + // Posix-specific: Log system resource metrics void LogPosixSystemMetrics(uint32_t session_id) const; @@ -134,7 +134,7 @@ class PosixTelemetry : public Telemetry { // Global registration count for singleton behavior static std::atomic global_register_count_; static std::mutex global_mutex_; - + // Make EventBuilder a friend so it can access GetPlatformInfo/GetDeviceInfo friend class EventBuilder; }; @@ -142,4 +142,3 @@ class PosixTelemetry : public Telemetry { } // namespace onnxruntime #endif // !_WIN32 - diff --git a/tools/ci_build/build.py b/tools/ci_build/build.py index 37aac6c199325..bfd1599aabc22 100644 --- a/tools/ci_build/build.py +++ b/tools/ci_build/build.py @@ -379,7 +379,7 @@ def generate_build_tree( cmake_args += [ "-Donnxruntime_USE_TELEMETRY=" + ("ON" if args.use_telemetry else "OFF"), ] - + if is_windows(): cmake_args += [ "-Donnxruntime_USE_DML=" + ("ON" if args.use_dml else "OFF"), diff --git a/tools/ci_build/build_args.py b/tools/ci_build/build_args.py index 0e1870ca0d316..01fd0bf151fb4 100644 --- a/tools/ci_build/build_args.py +++ b/tools/ci_build/build_args.py @@ -841,7 +841,9 @@ def add_other_feature_args(parser: argparse.ArgumentParser) -> None: help="Build ORT shared lib with compatible bridge for primary EPs (TRT, OV, QNN, VitisAI), excludes tests.", ) # Telemetry arguments (cross-platform) - parser.add_argument("--use_telemetry", action="store_true", help="Enable telemetry (ETW on Windows, 1DS on other platforms).") + parser.add_argument( + "--use_telemetry", action="store_true", help="Enable telemetry (ETW on Windows, 1DS on other platforms)." + ) def is_cross_compiling(args: argparse.Namespace) -> bool: From b03b21d46d6b82283f9746d16922963a335a03ef Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 4 Mar 2026 14:06:26 -0600 Subject: [PATCH 03/61] Fix POSIX telemetry build for macOS - Fix 1DS SDK FetchContent integration: - Set MAC_ARCH for macOS builds - Fix nlohmann/json.hpp include path (CMAKE_SOURCE_DIR mismatch) - Fix z/sqlite3 imported targets with IMPORTED_LOCATION - Disable ObjC/Swift wrappers, privacy guard, sanitizer modules - Build as static library - Fix telemetry.cc API compatibility with 1DS SDK v3.10.40.1: - Use ILogConfiguration& instead of LogConfiguration - Use raw ILogger* instead of shared_ptr (LogManager owns it) - Fix namespace casing (Microsoft vs microsoft) - Remove non-existent SetPIIKind/CFG_STR_SAMPLING_PERCENTAGE - Add LOGMANAGER_INSTANCE macro for template instantiation - Comment out unused transmit profile constants - Add required Apple frameworks: Foundation, SystemConfiguration, Network - Add 1DS SDK include directories for onnxruntime_common target Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../external/onnxruntime_external_deps.cmake | 45 +++++++++++++++++++ cmake/onnxruntime_common.cmake | 10 +++++ onnxruntime/core/platform/posix/telemetry.cc | 39 ++++++---------- onnxruntime/core/platform/posix/telemetry.h | 4 +- 4 files changed, 71 insertions(+), 27 deletions(-) diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index f440102f9a34b..b2c4f23e30b05 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -880,9 +880,28 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) set(BUILD_UNIT_TESTS_SAVED "${BUILD_UNIT_TESTS}") set(BUILD_FUNC_TESTS_SAVED "${BUILD_FUNC_TESTS}") set(BUILD_SAMPLES_SAVED "${BUILD_SAMPLES}") + set(BUILD_SHARED_LIBS_SAVED "${BUILD_SHARED_LIBS}") set(BUILD_UNIT_TESTS OFF CACHE BOOL "Disable 1DS SDK unit tests" FORCE) set(BUILD_FUNC_TESTS OFF CACHE BOOL "Disable 1DS SDK functional tests" FORCE) set(BUILD_SAMPLES OFF CACHE BOOL "Disable 1DS SDK samples" FORCE) + # Build 1DS SDK as static library + set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build 1DS SDK as static library" FORCE) + # Disable optional 1DS modules that may not have source in the release archive + set(BUILD_PRIVACYGUARD OFF CACHE BOOL "Disable 1DS privacy guard module" FORCE) + set(BUILD_SANITIZER OFF CACHE BOOL "Disable 1DS sanitizer module" FORCE) + # Disable ObjC and Swift wrappers - we use the C++ API directly + set(BUILD_OBJC_WRAPPER OFF CACHE BOOL "Disable 1DS ObjC wrapper" FORCE) + set(BUILD_SWIFT_WRAPPER OFF CACHE BOOL "Disable 1DS Swift wrapper" FORCE) + + # The 1DS SDK CMakeLists.txt expects MAC_ARCH on macOS (non-iOS) + if(APPLE AND NOT CMAKE_SYSTEM_NAME STREQUAL "iOS") + if(NOT DEFINED MAC_ARCH) + set(MAC_ARCH "${CMAKE_OSX_ARCHITECTURES}" CACHE STRING "Architecture for 1DS SDK on macOS" FORCE) + if(NOT MAC_ARCH) + set(MAC_ARCH "${CMAKE_SYSTEM_PROCESSOR}" CACHE STRING "Architecture for 1DS SDK on macOS" FORCE) + endif() + endif() + endif() onnxruntime_fetchcontent_declare( cpp_client_telemetry @@ -892,9 +911,35 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) ) onnxruntime_fetchcontent_makeavailable(cpp_client_telemetry) + # The 1DS SDK creates imported targets for z and sqlite3 but doesn't set + # IMPORTED_LOCATION, causing z-NOTFOUND/sqlite3-NOTFOUND link errors. + # Fix by setting the correct library locations on these imported targets. + if(TARGET z) + find_library(ZLIB_LIBRARY_ACTUAL z) + if(ZLIB_LIBRARY_ACTUAL) + set_target_properties(z PROPERTIES IMPORTED_LOCATION "${ZLIB_LIBRARY_ACTUAL}") + endif() + endif() + if(TARGET sqlite3 AND NOT TARGET sqlite3::sqlite3) + find_library(SQLITE3_LIBRARY_ACTUAL sqlite3) + if(SQLITE3_LIBRARY_ACTUAL) + set_target_properties(sqlite3 PROPERTIES IMPORTED_LOCATION "${SQLITE3_LIBRARY_ACTUAL}") + endif() + endif() + + # The 1DS SDK uses include_directories(${CMAKE_SOURCE_DIR}) to find its bundled + # nlohmann/json.hpp, sqlite, and zlib headers. When built via FetchContent, + # CMAKE_SOURCE_DIR points to ORT's root instead of the SDK source dir. + # Fix by adding the SDK source root to the mat target's include directories. + if(TARGET mat) + FetchContent_GetProperties(cpp_client_telemetry SOURCE_DIR CPP_CLIENT_TELEMETRY_SRC) + target_include_directories(mat PRIVATE "${CPP_CLIENT_TELEMETRY_SRC}") + endif() + set(BUILD_UNIT_TESTS "${BUILD_UNIT_TESTS_SAVED}" CACHE BOOL "" FORCE) set(BUILD_FUNC_TESTS "${BUILD_FUNC_TESTS_SAVED}" CACHE BOOL "" FORCE) set(BUILD_SAMPLES "${BUILD_SAMPLES_SAVED}" CACHE BOOL "" FORCE) + set(BUILD_SHARED_LIBS "${BUILD_SHARED_LIBS_SAVED}" CACHE BOOL "" FORCE) endif() FILE(TO_NATIVE_PATH ${CMAKE_BINARY_DIR} ORT_BINARY_DIR) diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index b3630178b051b..d010dee4046cc 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -216,11 +216,21 @@ endif() if(onnxruntime_USE_TELEMETRY AND NOT WIN32) if(TARGET mat) target_link_libraries(onnxruntime_common PRIVATE mat) + # Add 1DS SDK include directories for telemetry.cc + FetchContent_GetProperties(cpp_client_telemetry SOURCE_DIR CPP_CLIENT_TELEMETRY_SRC) + target_include_directories(onnxruntime_common PRIVATE + "${CPP_CLIENT_TELEMETRY_SRC}/lib/include/public" + "${CPP_CLIENT_TELEMETRY_SRC}/lib/include" + "${CPP_CLIENT_TELEMETRY_SRC}" + ) # Platform-specific system libraries required by the 1DS SDK if(APPLE) target_link_libraries(onnxruntime_common PRIVATE "-framework CoreFoundation" + "-framework Foundation" "-framework Security" + "-framework SystemConfiguration" + "-framework Network" z sqlite3 ) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 82f561df0f48e..47fc77ca8668b 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -26,6 +26,9 @@ using namespace Microsoft::Applications::Events; +// Instantiate the LogManager singleton template (required by the 1DS SDK) +LOGMANAGER_INSTANCE + namespace onnxruntime { // Static member initialization @@ -42,10 +45,10 @@ enum class EventPriority { CRITICAL = EventLatency_RealTime // ProcessInfo, SessionCreation }; -// Transmit profiles -constexpr const char* PROFILE_REAL_TIME = "REAL_TIME"; -constexpr const char* PROFILE_NEAR_REAL_TIME = "NEAR_REAL_TIME"; -constexpr const char* PROFILE_BEST_EFFORT = "BEST_EFFORT"; +// Transmit profiles (for future use) +// constexpr const char* PROFILE_REAL_TIME = "REAL_TIME"; +// constexpr const char* PROFILE_NEAR_REAL_TIME = "NEAR_REAL_TIME"; +// constexpr const char* PROFILE_BEST_EFFORT = "BEST_EFFORT"; // Helper class to build events with common properties class EventBuilder { @@ -61,8 +64,7 @@ class EventBuilder { // Set schema version for compatibility with Windows props_.SetProperty("schemaVersion", static_cast(0)); - // Privacy flags - no PII collection - props_.SetPIIKind(PiiKind_None); + // Privacy flags - no PII collection (PiiKind_None is the default for all properties) } EventBuilder& AddString(const char* key, const std::string& value) { @@ -192,7 +194,7 @@ PosixTelemetry::~PosixTelemetry() { } // Safe async event logging with error handling -void PosixTelemetry::LogEventAsync(microsoft::applications::events::EventProperties&& props) const { +void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventProperties&& props) const { if (!enabled_ || !logger_) { return; } @@ -210,7 +212,7 @@ void PosixTelemetry::Initialize() { std::lock_guard lock(mutex_); // Configure 1DS SDK for optimal async performance - LogConfiguration config; + ILogConfiguration& config = LogManager::GetLogConfiguration(); config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode @@ -223,16 +225,8 @@ void PosixTelemetry::Initialize() { config[CFG_INT_RAM_QUEUE_SIZE] = 512 * 1024; // 512KB RAM queue config[CFG_INT_RAM_QUEUE_BUFFERS] = 3; // Triple buffering for smooth async operation - // Sampling configuration (percentage: 100 = 100%, 10 = 10%) - // Sample 100% of critical events, 10% of routine events for performance - config[CFG_STR_SAMPLING_PERCENTAGE] = - "ProcessInfo=100,SessionCreation=100,SessionCreationStart=100," - "RuntimeError=100,EvaluationStart=10,EvaluationStop=10," - "RuntimePerf=10,CompileModelStart=50,CompileModelComplete=50," - "EpAutoSelection=50,ProviderOptions=10"; - - // Create logger instance - logger_ = LogManager::Initialize(TENANT_TOKEN, config); + // Create logger instance (raw pointer owned by LogManager) + logger_ = LogManager::Initialize(TENANT_TOKEN); if (logger_) { // Set privacy level - no PII collection @@ -254,16 +248,11 @@ void PosixTelemetry::Shutdown() { std::lock_guard lock(mutex_); if (logger_) { - // According to cpp_client_telemetry use-after-free docs: - // 1. Stop using ILogger before calling FlushAndTeardown - // 2. Reset shared_ptr to release reference before teardown - // 3. Call FlushAndTeardown only once when count reaches zero - // Disable logging first to prevent new events enabled_ = false; - // Release our reference to the logger - logger_.reset(); + // Clear our pointer (owned by LogManager, not us) + logger_ = nullptr; // Now safely call FlushAndTeardown // This will block until all pending events are sent or timeout diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 615f65f836f6f..c470c7048d1c0 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -119,8 +119,8 @@ class PosixTelemetry : public Telemetry { // Mutex for thread-safe access mutable std::mutex mutex_; - // Telemetry SDK logger instance (1DS) - std::shared_ptr<::Microsoft::Applications::Events::ILogger> logger_; + // Telemetry SDK logger instance (1DS) - raw pointer owned by LogManager + ::Microsoft::Applications::Events::ILogger* logger_{nullptr}; // State tracking mutable std::atomic enabled_{true}; From 310f99e3a7da0445201d419ccd4c9bcffd8318fa Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 5 Mar 2026 04:07:02 -0600 Subject: [PATCH 04/61] Modify device id and other events --- build.bat | 2 +- cgmanifests/cgmanifest.json | 10 + .../external/onnxruntime_external_deps.cmake | 7 + cmake/onnxruntime_common.cmake | 12 + onnxruntime/core/platform/posix/device_id.cc | 190 ++++++++++ onnxruntime/core/platform/posix/device_id.h | 71 ++++ onnxruntime/core/platform/posix/env.cc | 6 +- onnxruntime/core/platform/posix/telemetry.cc | 344 +++++++++++++----- onnxruntime/core/platform/posix/telemetry.h | 43 +-- 9 files changed, 576 insertions(+), 109 deletions(-) create mode 100644 onnxruntime/core/platform/posix/device_id.cc create mode 100644 onnxruntime/core/platform/posix/device_id.h diff --git a/build.bat b/build.bat index b05a4a0b28210..d0c6cbcddd669 100644 --- a/build.bat +++ b/build.bat @@ -7,4 +7,4 @@ setlocal set PATH=C:\Program Files\Git\usr\bin;%PATH% rem Requires a Python install to be available in your PATH -python "%~dp0\tools\ci_build\build.py" --build_dir "%~dp0\build\Windows" --use_telemetry %* +python "%~dp0\tools\ci_build\build.py" --build_dir "%~dp0\build\Windows" %* diff --git a/cgmanifests/cgmanifest.json b/cgmanifests/cgmanifest.json index bf889e9fb61a8..dfe6f0d4d1553 100644 --- a/cgmanifests/cgmanifest.json +++ b/cgmanifests/cgmanifest.json @@ -345,6 +345,16 @@ }, "comments": "python-pillow. Implementation logic for anti-aliasing copied by Resize CPU kernel." } + }, + { + "component": { + "type": "git", + "git": { + "commitHash": "ee2ded25e539f64052c9d8635bef4ea62c30e014", + "repositoryUrl": "https://github.com/microsoft/cpp_client_telemetry.git" + }, + "comments": "1DS SDK (cpp_client_telemetry) for cross-platform telemetry on non-Windows platforms (macOS, Linux, Android, iOS)." + } } ], "Version": 1 diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index f440102f9a34b..7b4c2d3b0da8a 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -892,6 +892,13 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) ) onnxruntime_fetchcontent_makeavailable(cpp_client_telemetry) + # cpp_client_telemetry's CMakeLists.txt uses include_directories(${CMAKE_SOURCE_DIR}) to find + # its bundled nlohmann/, sqlite/, and zlib/ headers. When built via FetchContent, CMAKE_SOURCE_DIR + # points to ORT's root instead. Fix by adding the actual source dir as an include path. + if(TARGET mat) + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}) + endif() + set(BUILD_UNIT_TESTS "${BUILD_UNIT_TESTS_SAVED}" CACHE BOOL "" FORCE) set(BUILD_FUNC_TESTS "${BUILD_FUNC_TESTS_SAVED}" CACHE BOOL "" FORCE) set(BUILD_SAMPLES "${BUILD_SAMPLES_SAVED}" CACHE BOOL "" FORCE) diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index b3630178b051b..994e8d89c066c 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -58,6 +58,8 @@ else() # Telemetry for non-Windows platforms (enabled by USE_TELEMETRY) if (onnxruntime_USE_TELEMETRY) list(APPEND onnxruntime_common_src_patterns + "${ONNXRUNTIME_ROOT}/core/platform/posix/device_id.h" + "${ONNXRUNTIME_ROOT}/core/platform/posix/device_id.cc" "${ONNXRUNTIME_ROOT}/core/platform/posix/telemetry.h" "${ONNXRUNTIME_ROOT}/core/platform/posix/telemetry.cc" ) @@ -216,6 +218,16 @@ endif() if(onnxruntime_USE_TELEMETRY AND NOT WIN32) if(TARGET mat) target_link_libraries(onnxruntime_common PRIVATE mat) + # cpp_client_telemetry uses include_directories() (directory-scoped) rather than + # target_include_directories(), so include paths don't propagate via target_link_libraries. + # Add them explicitly for onnxruntime_common. + if(DEFINED cpp_client_telemetry_SOURCE_DIR) + target_include_directories(onnxruntime_common PRIVATE + ${cpp_client_telemetry_SOURCE_DIR}/lib/include/public + ${cpp_client_telemetry_SOURCE_DIR}/lib/include/mat + ${cpp_client_telemetry_SOURCE_DIR}/lib + ) + endif() # Platform-specific system libraries required by the 1DS SDK if(APPLE) target_link_libraries(onnxruntime_common PRIVATE diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc new file mode 100644 index 0000000000000..d6fd592478109 --- /dev/null +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -0,0 +1,190 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#include "core/platform/posix/device_id.h" + +#include +#include +#include +#include +#include + +#include +#include + +#ifdef __APPLE__ +#include +#endif + +namespace onnxruntime { + +DeviceId& DeviceId::Instance() { + static DeviceId instance; + return instance; +} + +std::string DeviceId::GetValue() { + std::lock_guard lock(mutex_); + InitializeInternal(); + return device_id_; +} + +DeviceIdStatus DeviceId::GetStatus() { + std::lock_guard lock(mutex_); + InitializeInternal(); + return status_; +} + +std::string DeviceId::GetStatusString() { + switch (GetStatus()) { + case DeviceIdStatus::New: + return "New"; + case DeviceIdStatus::Existing: + return "Existing"; + case DeviceIdStatus::Corrupted: + return "Corrupted"; + case DeviceIdStatus::Failed: + return "Failed"; + default: + return "Unknown"; + } +} + +std::string DeviceId::GenerateUUID() { + std::random_device rd; + std::mt19937 gen(rd()); + std::uniform_int_distribution dist(0, UINT32_MAX); + + uint32_t data1 = dist(gen); + uint16_t data2 = static_cast(dist(gen) & 0xFFFF); + uint16_t data3 = static_cast((dist(gen) & 0x0FFF) | 0x4000); // Version 4 + uint16_t data4 = static_cast((dist(gen) & 0x3FFF) | 0x8000); // Variant 1 + uint16_t data5a = static_cast(dist(gen) & 0xFFFF); + uint32_t data5b = dist(gen); + + std::ostringstream oss; + oss << std::hex << std::setfill('0') + << std::setw(8) << data1 << '-' + << std::setw(4) << data2 << '-' + << std::setw(4) << data3 << '-' + << std::setw(4) << data4 << '-' + << std::setw(4) << data5a + << std::setw(8) << data5b; + return oss.str(); +} + +bool DeviceId::IsValidGUID(const std::string& str) { + if (str.length() != 36) return false; + + for (size_t i = 0; i < str.length(); ++i) { + char c = str[i]; + if (i == 8 || i == 13 || i == 18 || i == 23) { + if (c != '-') return false; + } else { + if (!((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F'))) { + return false; + } + } + } + return true; +} + +std::string DeviceId::GetStorageDirectory(bool mobile) { + const char* h = std::getenv("HOME"); + if (!h || !h[0]) return ""; + std::string home(h); + + if (mobile) { + return home + "/.onnxruntime"; + } + +#if defined(__APPLE__) + return home + "/Library/Application Support/" + kDeviceIdDir; +#else + return home + "/" + kDeviceIdDir; +#endif +} + +void DeviceId::CreateDirectoryTree(const std::string& path) { + if (path.empty()) return; + + size_t pos = path.find_last_of('/'); + if (pos != std::string::npos && pos > 0) { + CreateDirectoryTree(path.substr(0, pos)); + } + + mkdir(path.c_str(), 0755); +} + +void DeviceId::InitializeInternal() { + if (initialized_) return; + initialized_ = true; + + try { + // Use compile-time platform detection to select the appropriate storage path. + // This matches the mobile/desktop selection in posix/env.cc. +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) + constexpr bool is_mobile = true; +#else + constexpr bool is_mobile = false; +#endif + std::string dir_path = GetStorageDirectory(is_mobile); + if (dir_path.empty()) { + status_ = DeviceIdStatus::Failed; + return; + } + + std::string file_path = dir_path + "/" + kFileName; + + // Try to read existing device ID + { + std::ifstream infile(file_path); + if (infile.good()) { + infile.seekg(0, std::ios::end); + auto size = infile.tellg(); + infile.seekg(0, std::ios::beg); + + if (size > static_cast(kMaxFileSize)) { + status_ = DeviceIdStatus::Corrupted; + } else { + std::string content; + std::getline(infile, content); + + // Trim whitespace + while (!content.empty() && + (content.back() == '\n' || content.back() == '\r' || content.back() == ' ')) { + content.pop_back(); + } + + if (IsValidGUID(content)) { + device_id_ = content; + status_ = DeviceIdStatus::Existing; + return; + } + status_ = DeviceIdStatus::Corrupted; + } + } + } + + // Generate new device ID + device_id_ = GenerateUUID(); + + // Create directory tree + CreateDirectoryTree(dir_path); + + // Write to file + std::ofstream outfile(file_path); + if (outfile.good()) { + outfile << device_id_; + outfile.close(); + status_ = DeviceIdStatus::New; + } else { + status_ = DeviceIdStatus::Failed; + } + } catch (...) { + status_ = DeviceIdStatus::Failed; + // Keep device_id_ if generated — it's still valid for this session (in-memory only). + } +} + +} // namespace onnxruntime diff --git a/onnxruntime/core/platform/posix/device_id.h b/onnxruntime/core/platform/posix/device_id.h new file mode 100644 index 0000000000000..89cbd0945e045 --- /dev/null +++ b/onnxruntime/core/platform/posix/device_id.h @@ -0,0 +1,71 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#pragma once + +#include +#include +#include "core/common/common.h" + +namespace onnxruntime { + +enum class DeviceIdStatus { + New, // Device ID was newly generated + Existing, // Device ID was loaded from persistent storage + Corrupted, // Stored device ID was invalid and regenerated + Failed // Failed to persist device ID (in-memory only) +}; + +/** + * Manages a persistent device identifier for telemetry purposes. + * The device ID is stored in a platform-appropriate location: + * - macOS: ~/Library/Application Support/Microsoft/DeveloperTools/.onnxruntime/deviceid + * - Linux: ~/Microsoft/DeveloperTools/.onnxruntime/deviceid + * - iOS/Android: ~/.onnxruntime/deviceid (shorter path, avoids iCloud backup on iOS) + * + * Thread-safe singleton - use DeviceId::Instance() to access. + */ +class DeviceId { + public: + static DeviceId& Instance(); + + // Get the device ID value (generates/loads on first call) + std::string GetValue(); + + // Get the status of the device ID + DeviceIdStatus GetStatus(); + + // Get human-readable status string + std::string GetStatusString(); + + // Get the directory path for device ID / telemetry cache storage + // Desktop: ~/Microsoft/DeveloperTools/.onnxruntime (or platform equivalent) + // Mobile: ~/.onnxruntime + static std::string GetStorageDirectory(bool mobile = false); + + private: + DeviceId() = default; + ~DeviceId() = default; + ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(DeviceId); + + void InitializeInternal(); + + // Generate a random UUID v4 + static std::string GenerateUUID(); + + // Validate GUID format (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + static bool IsValidGUID(const std::string& str); + + // Create directory tree recursively using platform APIs + static void CreateDirectoryTree(const std::string& path); + + static constexpr const char* kDeviceIdDir = "Microsoft/DeveloperTools/.onnxruntime"; + static constexpr const char* kFileName = "deviceid"; + static constexpr size_t kMaxFileSize = 256; + + std::string device_id_; + DeviceIdStatus status_ = DeviceIdStatus::New; + bool initialized_ = false; + std::mutex mutex_; +}; +} // namespace onnxruntime diff --git a/onnxruntime/core/platform/posix/env.cc b/onnxruntime/core/platform/posix/env.cc index 6efbe5acd3d37..03c11982f7612 100644 --- a/onnxruntime/core/platform/posix/env.cc +++ b/onnxruntime/core/platform/posix/env.cc @@ -16,6 +16,10 @@ limitations under the License. #include "core/platform/env.h" +#ifdef __APPLE__ +#include +#endif + #ifdef USE_1DS_TELEMETRY #include "core/platform/posix/telemetry.h" #endif @@ -619,8 +623,6 @@ class PosixEnv : public Env { private: #ifdef USE_1DS_TELEMETRY PosixTelemetry telemetry_provider_; -#else - Telemetry telemetry_provider_; #endif #ifdef ORT_USE_CPUINFO PosixEnv() { diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 82f561df0f48e..8c91bb9467b1a 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -1,31 +1,39 @@ // Copyright (c) Microsoft Corporation. All rights reserved. // Licensed under the MIT License. -#ifndef _WIN32 // Only for non-Windows platforms - #include "core/platform/posix/telemetry.h" +#include "core/platform/posix/device_id.h" -// 1DS SDK includes +// 1DS SDK #include -#include -#include +#include -#include -#include #include -#include #include -#include "core/common/logging/logging.h" -#include "core/common/status.h" -#include "onnxruntime_config.h" - #ifdef __APPLE__ +#include #include #endif +#if defined(__linux__) || defined(__ANDROID__) +#include +#endif + +#include +#include +#include + +#include "core/common/logging/logging.h" +#include "core/common/status.h" +#include "onnxruntime_config.h" + using namespace Microsoft::Applications::Events; +// Instantiate the LogManager singleton (defines the static ILogManager* instance). +// Required because cpp_client_telemetry's LogManagerCX.cpp is not compiled in the FetchContent build. +LOGMANAGER_INSTANCE + namespace onnxruntime { // Static member initialization @@ -42,27 +50,19 @@ enum class EventPriority { CRITICAL = EventLatency_RealTime // ProcessInfo, SessionCreation }; -// Transmit profiles -constexpr const char* PROFILE_REAL_TIME = "REAL_TIME"; -constexpr const char* PROFILE_NEAR_REAL_TIME = "NEAR_REAL_TIME"; -constexpr const char* PROFILE_BEST_EFFORT = "BEST_EFFORT"; - // Helper class to build events with common properties class EventBuilder { private: EventProperties props_; public: - explicit EventBuilder(const char* event_name, EventPriority priority) - : props_(event_name) { + explicit EventBuilder(std::string event_name, EventPriority priority) + : props_(std::move(event_name)) { // Set latency/priority props_.SetLatency(static_cast(priority)); // Set schema version for compatibility with Windows props_.SetProperty("schemaVersion", static_cast(0)); - - // Privacy flags - no PII collection - props_.SetPIIKind(PiiKind_None); } EventBuilder& AddString(const char* key, const std::string& value) { @@ -151,8 +151,6 @@ class EventBuilder { // Add common platform/device context EventBuilder& AddCommonContext(const PosixTelemetry* telemetry) { - props_.SetProperty("platform", telemetry->GetPlatformInfo()); - props_.SetProperty("device", telemetry->GetDeviceInfo()); props_.SetProperty("projection", static_cast(telemetry->projection_.load())); return *this; } @@ -160,13 +158,24 @@ class EventBuilder { EventProperties Build() { return std::move(props_); } }; +// Hash a device ID string using std::hash and format as fixed-width hex. +// Ensures raw device identifiers are never sent over the wire. +static std::string HashDeviceId(const std::string& id) { + size_t hash = std::hash{}(id); + std::ostringstream oss; + oss << std::hex << std::setfill('0') << std::setw(sizeof(size_t) * 2) << hash; + return oss.str(); +} + PosixTelemetry::PosixTelemetry() { std::lock_guard lock(global_mutex_); - if (global_register_count_ == 0) { + // Always increment so destructor pairing is symmetric + global_register_count_++; + + if (global_register_count_ == 1) { try { Initialize(); - global_register_count_++; } catch (const std::exception& ex) { // Log error but don't fail construction // Telemetry failures should not break application functionality @@ -178,30 +187,21 @@ PosixTelemetry::PosixTelemetry() { PosixTelemetry::~PosixTelemetry() { std::lock_guard lock(global_mutex_); - if (global_register_count_ > 0) { - global_register_count_--; - if (global_register_count_ == 0) { - try { - Shutdown(); - } catch (const std::exception& ex) { - // Log error but don't throw from destructor - LOGS_DEFAULT(WARNING) << "Error during telemetry shutdown: " << ex.what(); - } + global_register_count_--; + if (global_register_count_ == 0) { + try { + Shutdown(); + } catch (const std::exception& ex) { + // Log error but don't throw from destructor + LOGS_DEFAULT(WARNING) << "Error during telemetry shutdown: " << ex.what(); } } } -// Safe async event logging with error handling -void PosixTelemetry::LogEventAsync(microsoft::applications::events::EventProperties&& props) const { - if (!enabled_ || !logger_) { - return; - } - +void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventProperties&& props) const { try { - // Use async LogEvent for non-blocking telemetry logger_->LogEvent(std::move(props)); } catch (const std::exception& ex) { - // Log telemetry failures to ORT logging system LOGS_DEFAULT(WARNING) << "[Telemetry] Failed to log event: " << ex.what(); } } @@ -210,29 +210,35 @@ void PosixTelemetry::Initialize() { std::lock_guard lock(mutex_); // Configure 1DS SDK for optimal async performance - LogConfiguration config; + ILogConfiguration config; config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown - // Configure cache for offline scenarios - config[CFG_STR_CACHE_FILE_PATH] = "/tmp/onnxruntime_telemetry_cache"; + // Configure cache for offline scenarios - use same directory as device ID storage + { +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) + constexpr bool is_mobile = true; +#else + constexpr bool is_mobile = false; +#endif + std::string cache_dir = DeviceId::GetStorageDirectory(is_mobile); + if (!cache_dir.empty()) { + std::string cache_path = cache_dir + "/telemetry_cache.db"; + config[CFG_STR_CACHE_FILE_PATH] = cache_path; + } + } // Configure RAM queue for async batching config[CFG_INT_RAM_QUEUE_SIZE] = 512 * 1024; // 512KB RAM queue config[CFG_INT_RAM_QUEUE_BUFFERS] = 3; // Triple buffering for smooth async operation - // Sampling configuration (percentage: 100 = 100%, 10 = 10%) - // Sample 100% of critical events, 10% of routine events for performance - config[CFG_STR_SAMPLING_PERCENTAGE] = - "ProcessInfo=100,SessionCreation=100,SessionCreationStart=100," - "RuntimeError=100,EvaluationStart=10,EvaluationStop=10," - "RuntimePerf=10,CompileModelStart=50,CompileModelComplete=50," - "EpAutoSelection=50,ProviderOptions=10"; - - // Create logger instance - logger_ = LogManager::Initialize(TENANT_TOKEN, config); + // Create logger instance (raw pointer, owned by LogManager) + auto* raw_logger = LogManager::Initialize(TENANT_TOKEN, config); + // Store as shared_ptr with no-op deleter since LogManager owns the lifetime + logger_ = std::shared_ptr<::Microsoft::Applications::Events::ILogger>( + raw_logger, [](::Microsoft::Applications::Events::ILogger*) {}); if (logger_) { // Set privacy level - no PII collection @@ -240,7 +246,29 @@ void PosixTelemetry::Initialize() { // Set platform information as context logger_->SetContext("Platform", GetPlatformInfo()); - logger_->SetContext("Device", GetDeviceInfo()); + + // Override the device ID with a hashed version for privacy. + // The "c:" prefix tells the backend it's a caller-supplied identifier. + auto* ctx = LogManager::GetSemanticContext(); + if (ctx) { + std::string raw_device_id; +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) + // Mobile: read the SDK's auto-generated platform device ID (e.g., identifierForVendor + // on iOS, ANDROID_ID on Android) and hash it before sending. + auto* provider = static_cast(ctx); + auto& fields = provider->GetCommonFields(); + auto it = fields.find(COMMONFIELDS_DEVICE_ID); + if (it != fields.end()) { + raw_device_id = it->second.to_string(); + } +#else + // Desktop: use our custom persistent UUID. + raw_device_id = DeviceId::Instance().GetValue(); +#endif + if (!raw_device_id.empty()) { + ctx->SetDeviceId("c:" + HashDeviceId(raw_device_id)); + } + } // Set application information logger_->SetContext("AppName", "ONNXRuntime"); @@ -272,31 +300,165 @@ void PosixTelemetry::Shutdown() { } std::string PosixTelemetry::GetPlatformInfo() const { - struct utsname system_info; - if (uname(&system_info) == 0) { - std::ostringstream oss; - oss << system_info.sysname << " " << system_info.release; - return oss.str(); - } - return "Unknown"; -} - -std::string PosixTelemetry::GetDeviceInfo() const { -#ifdef __APPLE__ +#if defined(__APPLE__) #if TARGET_OS_IOS return "iOS"; #elif TARGET_OS_MAC return "macOS"; +#else + return "Apple"; +#endif +#elif defined(__ANDROID__) + return "Android"; +#elif defined(__linux__) + return "Linux"; +#else + return "Unknown"; #endif +} + +// --------------------------------------------------------------------------- +// Process / system info helpers for LogProcessInfo +// --------------------------------------------------------------------------- + +// Get detailed OS version string (e.g., "macOS 15.2", "Ubuntu 22.04 LTS") +std::string PosixTelemetry::GetOsDescription() const { +#if defined(__APPLE__) + char version[64] = {}; + size_t len = sizeof(version); + if (sysctlbyname("kern.osproductversion", version, &len, nullptr, 0) == 0) { +#if TARGET_OS_IOS + return std::string("iOS ") + version; +#else + return std::string("macOS ") + version; +#endif + } + return GetPlatformInfo(); + #elif defined(__ANDROID__) + // Read Android system properties via /system/build.prop + std::string release, sdk; + std::ifstream prop("/system/build.prop"); + if (prop.is_open()) { + std::string line; + while (std::getline(prop, line)) { + if (line.rfind("ro.build.version.release=", 0) == 0) + release = line.substr(25); + else if (line.rfind("ro.build.version.sdk=", 0) == 0) + sdk = line.substr(21); + } + } + if (!release.empty()) { + std::string result = "Android " + release; + if (!sdk.empty()) result += " (API " + sdk + ")"; + return result; + } return "Android"; + #elif defined(__linux__) + // Parse /etc/os-release for PRETTY_NAME (e.g., "Ubuntu 22.04.3 LTS") + std::ifstream os_release("/etc/os-release"); + if (os_release.is_open()) { + std::string line; + while (std::getline(os_release, line)) { + if (line.rfind("PRETTY_NAME=", 0) == 0) { + std::string value = line.substr(12); + if (value.size() >= 2 && value.front() == '"' && value.back() == '"') { + value = value.substr(1, value.size() - 2); + } + return value; + } + } + } return "Linux"; + #else return "Unknown"; #endif } +// Get the name of the current process +std::string PosixTelemetry::GetProcessName() const { +#if defined(__APPLE__) || defined(__FreeBSD__) + const char* name = getprogname(); + return name ? name : ""; + +#elif defined(__linux__) || defined(__ANDROID__) + // /proc/self/comm contains the process name (up to 15 chars) + std::ifstream comm("/proc/self/comm"); + if (comm.is_open()) { + std::string name; + std::getline(comm, name); + while (!name.empty() && (name.back() == '\n' || name.back() == '\r')) + name.pop_back(); + return name; + } + return ""; + +#else + return ""; +#endif +} + +// Get the CPU architecture the binary was compiled for +std::string PosixTelemetry::GetArchitecture() { +#if defined(__x86_64__) + return "x86_64"; +#elif defined(__i386__) + return "x86"; +#elif defined(__aarch64__) + return "arm64"; +#elif defined(__arm__) + return "arm"; +#elif defined(__riscv) + return "riscv"; +#elif defined(__wasm__) + return "wasm"; +#else + return "unknown"; +#endif +} + +// Get total physical memory in MB +int64_t PosixTelemetry::GetTotalMemoryMB() { +#if defined(__APPLE__) + int64_t mem = 0; + size_t len = sizeof(mem); + if (sysctlbyname("hw.memsize", &mem, &len, nullptr, 0) == 0) { + return mem / (1024 * 1024); + } + return -1; + +#elif defined(__linux__) || defined(__ANDROID__) + long pages = sysconf(_SC_PHYS_PAGES); + long page_size = sysconf(_SC_PAGE_SIZE); + if (pages > 0 && page_size > 0) { + return static_cast(pages) * page_size / (1024 * 1024); + } + return -1; + +#else + return -1; +#endif +} + +// Get system locale (e.g., "en-US", "ja-JP") +std::string PosixTelemetry::GetLocale() { + const char* lang = std::getenv("LANG"); + if (lang && lang[0]) { + std::string loc(lang); + // Strip encoding suffix (e.g., "en_US.UTF-8" → "en_US") + auto dot = loc.find('.'); + if (dot != std::string::npos) loc = loc.substr(0, dot); + // Normalize separator: "en_US" → "en-US" + for (auto& c : loc) { + if (c == '_') c = '-'; + } + return loc; + } + return ""; +} + void PosixTelemetry::EnableTelemetryEvents() const { enabled_ = true; } @@ -322,7 +484,8 @@ uint64_t PosixTelemetry::Keyword() const { } void PosixTelemetry::LogProcessInfo() const { - if (!enabled_ || !logger_) { + // LogProcessInfo only collects system metadata and always fires if we have a valid logger. + if (!logger_) { return; } @@ -331,13 +494,22 @@ void PosixTelemetry::LogProcessInfo() const { return; } - auto event = EventBuilder("ProcessInfo", EventPriority::CRITICAL) - .AddCommonContext(this) - .AddString("runtimeVersion", ORT_VERSION) - .AddInt32("processId", static_cast(getpid())) - .Build(); + auto builder = EventBuilder("ProcessInfo", EventPriority::CRITICAL) + .AddCommonContext(this) + .AddString("runtimeVersion", ORT_VERSION) +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) + .AddString("DeviceInfo.Status", "Mobile") +#else + .AddString("DeviceInfo.Status", DeviceId::Instance().GetStatusString()) +#endif + .AddString("osDescription", GetOsDescription()) + .AddString("processName", GetProcessName()) + .AddString("architecture", GetArchitecture()) + .AddInt32("cpuCount", static_cast(std::thread::hardware_concurrency())) + .AddInt64("totalMemoryMB", GetTotalMemoryMB()) + .AddString("locale", GetLocale()); - LogEventAsync(std::move(event)); + LogEventAsync(builder.Build()); } void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { @@ -364,6 +536,9 @@ void PosixTelemetry::LogEvaluationStop(uint32_t session_id) const { .Build(); LogEventAsync(std::move(event)); + + // Capture system metrics after each inference run to observe impact + LogSystemMetrics(session_id); } void PosixTelemetry::LogEvaluationStart(uint32_t session_id) const { @@ -398,9 +573,11 @@ void PosixTelemetry::LogSessionCreation( return; } - const char* event_name = captureState ? "SessionCreation_CaptureState" : "SessionCreation"; + // captureState is currently only triggered on Windows via ETW's EVENT_CONTROL_CODE_CAPTURE_STATE callback + // (LogAllSessions). Kept here for future compatibility if a similar mechanism is added for POSIX. + std::string event_name = captureState ? "SessionCreation_CaptureState" : "SessionCreation"; - auto builder = EventBuilder(event_name, EventPriority::CRITICAL) + auto builder = EventBuilder(std::move(event_name), EventPriority::CRITICAL) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt64("irVersion", ir_version) @@ -553,9 +730,9 @@ void PosixTelemetry::LogProviderOptions( return; } - const char* event_name = captureState ? "ProviderOptions_CaptureState" : "ProviderOptions"; + std::string event_name = captureState ? "ProviderOptions_CaptureState" : "ProviderOptions"; - auto event = EventBuilder(event_name, EventPriority::NORMAL) + auto event = EventBuilder(std::move(event_name), EventPriority::NORMAL) .AddCommonContext(this) .AddString("providerId", provider_id) .AddString("providerOptions", provider_options_string) @@ -564,22 +741,21 @@ void PosixTelemetry::LogProviderOptions( LogEventAsync(std::move(event)); } -// Posix-specific: Log system resource metrics -void PosixTelemetry::LogPosixSystemMetrics(uint32_t session_id) const { +void PosixTelemetry::LogSystemMetrics(uint32_t session_id) const { if (!enabled_ || !logger_) { return; } struct rusage usage; if (getrusage(RUSAGE_SELF, &usage) == 0) { - // Note: ru_maxrss is in KB on Linux, bytes on macOS + // ru_maxrss is in KB on Linux, bytes on macOS #ifdef __APPLE__ int64_t max_rss_kb = usage.ru_maxrss / 1024; #else int64_t max_rss_kb = usage.ru_maxrss; #endif - auto event = EventBuilder("PosixSystemMetrics", EventPriority::NORMAL) + auto event = EventBuilder("SystemMetrics", EventPriority::NORMAL) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt64("maxRssKb", max_rss_kb) @@ -598,5 +774,3 @@ void PosixTelemetry::LogPosixSystemMetrics(uint32_t session_id) const { } } // namespace onnxruntime - -#endif // !_WIN32 diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 615f65f836f6f..0f5c0a949b07c 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -3,8 +3,6 @@ #pragma once -#ifndef _WIN32 // Only for non-Windows platforms - #include "core/platform/telemetry.h" #include #include @@ -12,25 +10,24 @@ #include #include -// Forward declarations of 1DS SDK types (must be at global scope) +// Forward declarations of 1DS SDK types namespace Microsoft::Applications::Events { class ILogger; -class ISemanticContext; class EventProperties; } // namespace Microsoft::Applications::Events namespace onnxruntime { /** -* @brief Telemetry implementation for non-Windows platforms. -* -* This class provides telemetry logging capabilities for macOS, Linux, Android, and iOS -* using the cpp_client_telemetry library (1DS SDK). It implements the same interface -* as WindowsTelemetry to provide consistent telemetry across all platforms. -* -* Configuration: -* - Telemetry is opt-in via build flags -*/ + * @brief Cross-platform telemetry implementation using 1DS SDK (cpp_client_telemetry). + * + * This class provides telemetry logging capabilities for all platforms + * using the cpp_client_telemetry library (1DS SDK). It implements the same interface + * as the original WindowsTelemetry to provide consistent telemetry across all platforms. + * + * Configuration: + * - Telemetry is opt-in via build flags + */ class PosixTelemetry : public Telemetry { public: PosixTelemetry(); @@ -106,15 +103,21 @@ class PosixTelemetry : public Telemetry { // Shutdown telemetry SDK logger void Shutdown(); - // Helper to get platform-specific information + // Helper to get platform name std::string GetPlatformInfo() const; - std::string GetDeviceInfo() const; - // Safe async event logging + // Process/system info helpers for LogProcessInfo + std::string GetOsDescription() const; + std::string GetProcessName() const; + static std::string GetArchitecture(); + static int64_t GetTotalMemoryMB(); + static std::string GetLocale(); + + // Safe async event logging. void LogEventAsync(::Microsoft::Applications::Events::EventProperties&& props) const; - // Posix-specific: Log system resource metrics - void LogPosixSystemMetrics(uint32_t session_id) const; + // Log system resource metrics + void LogSystemMetrics(uint32_t session_id) const; // Mutex for thread-safe access mutable std::mutex mutex_; @@ -135,10 +138,8 @@ class PosixTelemetry : public Telemetry { static std::atomic global_register_count_; static std::mutex global_mutex_; - // Make EventBuilder a friend so it can access GetPlatformInfo/GetDeviceInfo + // Make EventBuilder a friend so it can access projection_ friend class EventBuilder; }; } // namespace onnxruntime - -#endif // !_WIN32 From ed5dc835ae470a660c233afc05509b95655120f9 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 2 Apr 2026 15:16:01 -0500 Subject: [PATCH 05/61] Service fallback Windows, fix Posix telemetry, add dep --- .../external/onnxruntime_external_deps.cmake | 36 ++++ onnxruntime/core/platform/posix/telemetry.cc | 182 +++++++++++------- onnxruntime/core/platform/posix/telemetry.h | 11 +- .../core/platform/windows/telemetry.cc | 40 +++- 4 files changed, 192 insertions(+), 77 deletions(-) diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index 7b4c2d3b0da8a..f842f15661fbd 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -897,6 +897,42 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) # points to ORT's root instead. Fix by adding the actual source dir as an include path. if(TARGET mat) target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}) + # Also add subdirectories for bundled headers (sqlite3.h, zlib.h) that are included without path prefix + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/sqlite) + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/zlib) + # ORT enables -ffast-math globally, which conflicts with std::numeric_limits::infinity() + # in the 1DS SDK's bundled nlohmann/json.hpp. Re-enable finite math to fix. + # Also suppress warnings in the 1DS SDK code that are treated as errors. + target_compile_options(mat PRIVATE + -fno-finite-math-only + -Wno-unused-const-variable + $<$:-Wno-reorder> + $<$:-Wno-reorder-ctor> + ) + endif() + + # The 1DS SDK creates GLOBAL imported targets 'z' and 'sqlite3' without setting IMPORTED_LOCATION, + # which causes link errors on cross-compile. For Android, the 1DS cmake now builds from bundled source. + # For other platforms, resolve the imported targets if possible. + if(NOT ANDROID) + if(TARGET z) + get_target_property(_z_loc z IMPORTED_LOCATION) + if(NOT _z_loc OR _z_loc STREQUAL "_z_loc-NOTFOUND") + find_package(ZLIB QUIET) + if(ZLIB_FOUND) + set_target_properties(z PROPERTIES IMPORTED_LOCATION "${ZLIB_LIBRARIES}") + endif() + endif() + endif() + if(TARGET sqlite3) + get_target_property(_sqlite3_loc sqlite3 IMPORTED_LOCATION) + if(NOT _sqlite3_loc OR _sqlite3_loc STREQUAL "_sqlite3_loc-NOTFOUND") + find_library(_sqlite3_lib sqlite3) + if(_sqlite3_lib) + set_target_properties(sqlite3 PROPERTIES IMPORTED_LOCATION "${_sqlite3_lib}") + endif() + endif() + endif() endif() set(BUILD_UNIT_TESTS "${BUILD_UNIT_TESTS_SAVED}" CACHE BOOL "" FORCE) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 8c91bb9467b1a..6eb5dfb11ddef 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -5,7 +5,8 @@ #include "core/platform/posix/device_id.h" // 1DS SDK -#include +#include +#include #include #include @@ -30,10 +31,6 @@ using namespace Microsoft::Applications::Events; -// Instantiate the LogManager singleton (defines the static ILogManager* instance). -// Required because cpp_client_telemetry's LogManagerCX.cpp is not compiled in the FetchContent build. -LOGMANAGER_INSTANCE - namespace onnxruntime { // Static member initialization @@ -56,13 +53,20 @@ class EventBuilder { EventProperties props_; public: - explicit EventBuilder(std::string event_name, EventPriority priority) + explicit EventBuilder(std::string event_name, EventPriority priority, + uint64_t privacy_tags = PDT_ProductAndServicePerformance) : props_(std::move(event_name)) { // Set latency/priority props_.SetLatency(static_cast(priority)); // Set schema version for compatibility with Windows props_.SetProperty("schemaVersion", static_cast(0)); + + // All ORT telemetry is required system metadata (no PII) + props_.SetLevel(DIAG_LEVEL_REQUIRED); + + // Privacy data tags for GDPR compliance classification + props_.SetProperty(COMMONFIELDS_EVENT_PRIVTAGS, static_cast(privacy_tags)); } EventBuilder& AddString(const char* key, const std::string& value) { @@ -144,7 +148,7 @@ class EventBuilder { EventBuilder& AddBatchSizeDurations(const std::unordered_map& durations) { for (const auto& [batch_size, duration] : durations) { std::string key = "batchSize_" + std::to_string(batch_size); - props_.SetProperty(key, duration); + props_.SetProperty(key, static_cast(duration)); } return *this; } @@ -209,14 +213,23 @@ void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventPropert void PosixTelemetry::Initialize() { std::lock_guard lock(mutex_); - // Configure 1DS SDK for optimal async performance - ILogConfiguration config; + // NOTE: On Android, the Java layer must be initialized before calling this: + // System.loadLibrary("maesdk"); + // new HttpClient(getApplicationContext()); + // OfflineRoom.connectContext(getApplicationContext()); // if using Room DB + // See cpp_client_telemetry/docs/cpp-start-android.md for details. + + // Create SDK configuration — stored as member because LogManagerImpl holds a reference + // and the configuration must remain valid for the lifetime of the log manager. + config_ = std::make_unique(); + auto& config = *config_; + config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown - // Configure cache for offline scenarios - use same directory as device ID storage + // Configure cache for offline scenarios — use same directory as device ID storage { #if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) constexpr bool is_mobile = true; @@ -232,70 +245,79 @@ void PosixTelemetry::Initialize() { // Configure RAM queue for async batching config[CFG_INT_RAM_QUEUE_SIZE] = 512 * 1024; // 512KB RAM queue - config[CFG_INT_RAM_QUEUE_BUFFERS] = 3; // Triple buffering for smooth async operation - - // Create logger instance (raw pointer, owned by LogManager) - auto* raw_logger = LogManager::Initialize(TENANT_TOKEN, config); - // Store as shared_ptr with no-op deleter since LogManager owns the lifetime - logger_ = std::shared_ptr<::Microsoft::Applications::Events::ILogger>( - raw_logger, [](::Microsoft::Applications::Events::ILogger*) {}); - - if (logger_) { - // Set privacy level - no PII collection - logger_->SetContext("PrivacyLevel", "o:0"); - - // Set platform information as context - logger_->SetContext("Platform", GetPlatformInfo()); - - // Override the device ID with a hashed version for privacy. - // The "c:" prefix tells the backend it's a caller-supplied identifier. - auto* ctx = LogManager::GetSemanticContext(); - if (ctx) { - std::string raw_device_id; + + // Create log manager via LogManagerProvider (recommended for production use, + // per LogManager_Creation_and_Lifecycle_Management.md). + status_t status; + log_manager_ = LogManagerProvider::CreateLogManager(*config_, status); + if (status != STATUS_SUCCESS || !log_manager_) { + LOGS_DEFAULT(WARNING) << "Failed to create telemetry LogManager, status: " << status; + config_.reset(); + return; + } + + // Get logger for our tenant + logger_ = log_manager_->GetLogger(TENANT_TOKEN); + if (!logger_) { + LOGS_DEFAULT(WARNING) << "Failed to get telemetry logger"; + LogManagerProvider::Release(*config_); + log_manager_ = nullptr; + config_.reset(); + return; + } + + // Use BEST_EFFORT transmit profile to minimize battery and network impact. + // Events are batched and uploaded at a lower cadence. + log_manager_->SetTransmitProfile(TransmitProfile_BestEffort); + + // Override device ID with hashed version for privacy. + // The "c:" prefix tells the backend it's a caller-supplied identifier. + auto& ctx = log_manager_->GetSemanticContext(); + std::string raw_device_id; #if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) - // Mobile: read the SDK's auto-generated platform device ID (e.g., identifierForVendor - // on iOS, ANDROID_ID on Android) and hash it before sending. - auto* provider = static_cast(ctx); - auto& fields = provider->GetCommonFields(); - auto it = fields.find(COMMONFIELDS_DEVICE_ID); - if (it != fields.end()) { - raw_device_id = it->second.to_string(); - } + // Mobile: read SDK's auto-generated platform device ID (e.g., identifierForVendor + // on iOS, ANDROID_ID on Android) and hash it before sending. + auto* provider = static_cast(&ctx); + auto& fields = provider->GetCommonFields(); + auto it = fields.find(COMMONFIELDS_DEVICE_ID); + if (it != fields.end()) { + raw_device_id = it->second.to_string(); + } #else - // Desktop: use our custom persistent UUID. - raw_device_id = DeviceId::Instance().GetValue(); + // Desktop: use our custom persistent UUID. + raw_device_id = DeviceId::Instance().GetValue(); #endif - if (!raw_device_id.empty()) { - ctx->SetDeviceId("c:" + HashDeviceId(raw_device_id)); - } - } + if (!raw_device_id.empty()) { + ctx.SetDeviceId("c:" + HashDeviceId(raw_device_id)); + } - // Set application information - logger_->SetContext("AppName", "ONNXRuntime"); - logger_->SetContext("AppVersion", ORT_VERSION); + // Set application information as logger context (attached to all events) + logger_->SetContext("AppName", "ONNXRuntime"); + logger_->SetContext("AppVersion", ORT_VERSION); + logger_->SetContext("Platform", GetPlatformInfo()); - enabled_ = true; - } + enabled_ = true; } void PosixTelemetry::Shutdown() { std::lock_guard lock(mutex_); - if (logger_) { - // According to cpp_client_telemetry use-after-free docs: - // 1. Stop using ILogger before calling FlushAndTeardown - // 2. Reset shared_ptr to release reference before teardown - // 3. Call FlushAndTeardown only once when count reaches zero - - // Disable logging first to prevent new events - enabled_ = false; + // Disable logging first to prevent new events during shutdown + enabled_ = false; + logger_ = nullptr; // Owned by log_manager_, will be destroyed with it - // Release our reference to the logger - logger_.reset(); + if (log_manager_ && config_) { + // Per SDK use-after-free docs (use-after-free.md): + // Flush() must be called before FlushAndTeardown() to ensure all pending + // events are persisted to offline storage. FlushAndTeardown() internally + // calls PauseActivity() + WaitPause() to quiesce the SDK. + log_manager_->Flush(); + log_manager_->FlushAndTeardown(); - // Now safely call FlushAndTeardown - // This will block until all pending events are sent or timeout - LogManager::FlushAndTeardown(); + // Release the log manager instance via LogManagerProvider + LogManagerProvider::Release(*config_); + log_manager_ = nullptr; + config_.reset(); } } @@ -494,7 +516,8 @@ void PosixTelemetry::LogProcessInfo() const { return; } - auto builder = EventBuilder("ProcessInfo", EventPriority::CRITICAL) + auto builder = EventBuilder("ProcessInfo", EventPriority::CRITICAL, + PDT_DeviceConnectivityAndConfiguration | PDT_SoftwareSetupAndInventory) .AddCommonContext(this) .AddString("runtimeVersion", ORT_VERSION) #if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) @@ -517,7 +540,8 @@ void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { return; } - auto event = EventBuilder("SessionCreationStart", EventPriority::CRITICAL) + auto event = EventBuilder("SessionCreationStart", EventPriority::CRITICAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .Build(); @@ -530,7 +554,8 @@ void PosixTelemetry::LogEvaluationStop(uint32_t session_id) const { return; } - auto event = EventBuilder("EvaluationStop", EventPriority::NORMAL) + auto event = EventBuilder("EvaluationStop", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .Build(); @@ -546,7 +571,8 @@ void PosixTelemetry::LogEvaluationStart(uint32_t session_id) const { return; } - auto event = EventBuilder("EvaluationStart", EventPriority::NORMAL) + auto event = EventBuilder("EvaluationStart", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .Build(); @@ -577,7 +603,8 @@ void PosixTelemetry::LogSessionCreation( // (LogAllSessions). Kept here for future compatibility if a similar mechanism is added for POSIX. std::string event_name = captureState ? "SessionCreation_CaptureState" : "SessionCreation"; - auto builder = EventBuilder(std::move(event_name), EventPriority::CRITICAL) + auto builder = EventBuilder(std::move(event_name), EventPriority::CRITICAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt64("irVersion", ir_version) @@ -611,7 +638,8 @@ void PosixTelemetry::LogCompileModelStart( return; } - auto event = EventBuilder("CompileModelStart", EventPriority::NORMAL) + auto event = EventBuilder("CompileModelStart", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddString("inputSource", input_source) @@ -636,7 +664,8 @@ void PosixTelemetry::LogCompileModelComplete( return; } - auto event = EventBuilder("CompileModelComplete", EventPriority::NORMAL) + auto event = EventBuilder("CompileModelComplete", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddBool("success", success) @@ -655,7 +684,8 @@ void PosixTelemetry::LogRuntimeError( return; } - auto event = EventBuilder("RuntimeError", EventPriority::HIGH) + auto event = EventBuilder("RuntimeError", EventPriority::HIGH, + PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt32("errorCode", static_cast(status.Code())) @@ -677,7 +707,8 @@ void PosixTelemetry::LogRuntimePerf( return; } - auto event = EventBuilder("RuntimePerf", EventPriority::NORMAL) + auto event = EventBuilder("RuntimePerf", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddUInt32("totalRunsSinceLast", total_runs_since_last) @@ -711,7 +742,8 @@ void PosixTelemetry::LogAutoEpSelection( return; } - auto event = EventBuilder("EpAutoSelection", EventPriority::NORMAL) + auto event = EventBuilder("EpAutoSelection", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddString("selectionPolicy", selection_policy) @@ -732,7 +764,8 @@ void PosixTelemetry::LogProviderOptions( std::string event_name = captureState ? "ProviderOptions_CaptureState" : "ProviderOptions"; - auto event = EventBuilder(std::move(event_name), EventPriority::NORMAL) + auto event = EventBuilder(std::move(event_name), EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory) .AddCommonContext(this) .AddString("providerId", provider_id) .AddString("providerOptions", provider_options_string) @@ -755,7 +788,8 @@ void PosixTelemetry::LogSystemMetrics(uint32_t session_id) const { int64_t max_rss_kb = usage.ru_maxrss; #endif - auto event = EventBuilder("SystemMetrics", EventPriority::NORMAL) + auto event = EventBuilder("SystemMetrics", EventPriority::NORMAL, + PDT_ProductAndServicePerformance | PDT_DeviceConnectivityAndConfiguration) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt64("maxRssKb", max_rss_kb) diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 0f5c0a949b07c..dde07ca0d53fc 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -13,6 +13,8 @@ // Forward declarations of 1DS SDK types namespace Microsoft::Applications::Events { class ILogger; +class ILogManager; +class ILogConfiguration; class EventProperties; } // namespace Microsoft::Applications::Events @@ -122,8 +124,13 @@ class PosixTelemetry : public Telemetry { // Mutex for thread-safe access mutable std::mutex mutex_; - // Telemetry SDK logger instance (1DS) - std::shared_ptr<::Microsoft::Applications::Events::ILogger> logger_; + // Telemetry SDK instances. + // log_manager_ is owned by LogManagerProvider; logger_ is owned by log_manager_. + ::Microsoft::Applications::Events::ILogManager* log_manager_ = nullptr; + ::Microsoft::Applications::Events::ILogger* logger_ = nullptr; + + // SDK configuration — must outlive log_manager_ (LogManagerImpl holds a reference). + std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> config_; // State tracking mutable std::atomic enabled_{true}; diff --git a/onnxruntime/core/platform/windows/telemetry.cc b/onnxruntime/core/platform/windows/telemetry.cc index d4ded4b25774b..c397e8870a569 100644 --- a/onnxruntime/core/platform/windows/telemetry.cc +++ b/onnxruntime/core/platform/windows/telemetry.cc @@ -3,6 +3,7 @@ #include "core/platform/windows/telemetry.h" #include +#include #include #include #include @@ -75,14 +76,45 @@ std::string ConvertWideStringToUtf8(const std::wstring& wide) { return utf8; } +// Parse the command line for -s (service name) and -k (service group) arguments. +// These are svchost.exe conventions and may not be present for all services. +std::string GetServiceNamesFromCommandLine() { + LPCWSTR cmd_line = ::GetCommandLineW(); + if (cmd_line == nullptr) + return {}; + + int argc = 0; + LPWSTR* argv = ::CommandLineToArgvW(cmd_line, &argc); + if (argv == nullptr) + return {}; + + std::wstring aggregated; + bool first = true; + for (int i = 0; i < argc - 1; ++i) { + if ((_wcsicmp(argv[i], L"-s") == 0 || _wcsicmp(argv[i], L"-k") == 0)) { + if (!first) { + aggregated.push_back(L','); + } + aggregated.append(argv[i + 1]); + first = false; + ++i; // skip the value we just consumed + } + } + + ::LocalFree(argv); + return ConvertWideStringToUtf8(aggregated); +} + std::string GetServiceNamesForCurrentProcess() { static std::once_flag once_flag; static std::string service_names; std::call_once(once_flag, [] { SC_HANDLE service_manager = ::OpenSCManagerW(nullptr, nullptr, SC_MANAGER_ENUMERATE_SERVICE); - if (service_manager == nullptr) + if (service_manager == nullptr) { + service_names = GetServiceNamesFromCommandLine(); return; + } DWORD bytes_needed = 0; DWORD services_returned = 0; @@ -91,11 +123,13 @@ std::string GetServiceNamesForCurrentProcess() { &services_returned, &resume_handle, nullptr) && ::GetLastError() != ERROR_MORE_DATA) { ::CloseServiceHandle(service_manager); + service_names = GetServiceNamesFromCommandLine(); return; } if (bytes_needed == 0) { ::CloseServiceHandle(service_manager); + service_names = GetServiceNamesFromCommandLine(); return; } @@ -106,6 +140,7 @@ std::string GetServiceNamesForCurrentProcess() { if (!::EnumServicesStatusExW(service_manager, SC_ENUM_PROCESS_INFO, SERVICE_WIN32, SERVICE_ACTIVE, reinterpret_cast(services), bytes_needed, &bytes_needed, &services_returned, &resume_handle, nullptr)) { ::CloseServiceHandle(service_manager); + service_names = GetServiceNamesFromCommandLine(); return; } @@ -125,6 +160,9 @@ std::string GetServiceNamesForCurrentProcess() { ::CloseServiceHandle(service_manager); service_names = ConvertWideStringToUtf8(aggregated); + if (service_names.empty()) { + service_names = GetServiceNamesFromCommandLine(); + } }); return service_names; From 08349121f4065f8c1b57ffdce8a8499e74441632 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 15 Apr 2026 18:19:46 -0500 Subject: [PATCH 06/61] Create .onnxruntime directory before opening telemetry cache On iOS (and other POSIX platforms), GetStorageDirectory() returns the path but doesn't create it. SQLite can create the .db file but not the parent directory, causing 'No such file or directory' errors and preventing telemetry caching. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 63e70dd897c26..2cf8476d09bea 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -10,6 +10,7 @@ #include #include +#include #ifdef __APPLE__ #include @@ -226,6 +227,7 @@ void PosixTelemetry::Initialize() { #endif std::string cache_dir = DeviceId::GetStorageDirectory(is_mobile); if (!cache_dir.empty()) { + mkdir(cache_dir.c_str(), 0755); // Ensure directory exists for telemetry cache std::string cache_path = cache_dir + "/telemetry_cache.db"; config[CFG_STR_CACHE_FILE_PATH] = cache_path; } From 8b7a4cd812e1afb73163d27b1be3b7266775ef76 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 16 Apr 2026 20:47:15 -0500 Subject: [PATCH 07/61] Fix iOS POSIX telemetry rebuild and cache path Restore the iOS cache path fix, make the 1DS SDK lifetime handling robust, and link telemetry's bundled zlib in ORT's FetchContent iOS build so device dylib/framework rebuilds succeed again. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../external/onnxruntime_external_deps.cmake | 84 ++++++-- onnxruntime/core/platform/posix/device_id.cc | 6 + onnxruntime/core/platform/posix/device_id.h | 6 +- onnxruntime/core/platform/posix/telemetry.cc | 202 ++++++++++++------ onnxruntime/core/platform/posix/telemetry.h | 13 +- 5 files changed, 219 insertions(+), 92 deletions(-) diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index b1b52e0212d55..7654d896d341e 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -938,27 +938,54 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) ) onnxruntime_fetchcontent_makeavailable(cpp_client_telemetry) - # The 1DS SDK creates imported targets for z and sqlite3 but doesn't set - # IMPORTED_LOCATION, causing z-NOTFOUND/sqlite3-NOTFOUND link errors. - # Fix by setting the correct library locations on these imported targets. - if(TARGET z) - find_library(ZLIB_LIBRARY_ACTUAL z) - if(ZLIB_LIBRARY_ACTUAL) - set_target_properties(z PROPERTIES IMPORTED_LOCATION "${ZLIB_LIBRARY_ACTUAL}") - endif() - endif() - if(TARGET sqlite3 AND NOT TARGET sqlite3::sqlite3) - find_library(SQLITE3_LIBRARY_ACTUAL sqlite3) - if(SQLITE3_LIBRARY_ACTUAL) - set_target_properties(sqlite3 PROPERTIES IMPORTED_LOCATION "${SQLITE3_LIBRARY_ACTUAL}") - endif() - endif() - # cpp_client_telemetry's CMakeLists.txt uses include_directories(${CMAKE_SOURCE_DIR}) to find # its bundled nlohmann/, sqlite/, and zlib/ headers. When built via FetchContent, CMAKE_SOURCE_DIR # points to ORT's root instead. Fix by adding the actual source dir as an include path. if(TARGET mat) target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}) + # Also add subdirectories for bundled headers (sqlite3.h, zlib.h) that are included without + # a path prefix in the 1DS SDK sources. + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/sqlite) + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/zlib) + # ORT enables -ffast-math globally, which conflicts with + # std::numeric_limits::infinity() in the 1DS SDK's bundled nlohmann/json.hpp. + # Also suppress warnings in the 1DS SDK code that ORT treats as errors. + target_compile_options(mat PRIVATE + -fno-finite-math-only + -Wno-unused-const-variable + $<$:-Wno-reorder> + $<$:-Wno-reorder-ctor> + ) + # The vendored zlib headers always prefix exported symbols via names.h (act_z_*), + # so iOS cannot link mat against the system zlib. Mirror the SDK's Android build + # and provide a bundled zlib target for ORT's FetchContent build. + if(CMAKE_SYSTEM_NAME STREQUAL "iOS" AND NOT TARGET onnxruntime_mat_zlib_bundled) + add_library(onnxruntime_mat_zlib_bundled STATIC + "${cpp_client_telemetry_SOURCE_DIR}/zlib/adler32.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/compress.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/crc32.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/deflate.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/gzclose.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/gzlib.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/gzread.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/gzwrite.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/infback.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/inffast.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/inflate.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/inftrees.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/trees.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/uncompr.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/zutil.c" + "${cpp_client_telemetry_SOURCE_DIR}/zlib/simd_stub.c" + ) + target_include_directories(onnxruntime_mat_zlib_bundled PUBLIC "${cpp_client_telemetry_SOURCE_DIR}/zlib") + target_compile_options(onnxruntime_mat_zlib_bundled PRIVATE + -Wno-strict-prototypes + -Wno-deprecated-non-prototype + -Wno-implicit-function-declaration + ) + target_link_libraries(mat PUBLIC onnxruntime_mat_zlib_bundled) + endif() # The 1DS SDK's iOS path calls xcodebuild to find the sysroot, which can # fail (license not accepted, missing tools) and leave CMAKE_OSX_SYSROOT # empty in its scope. Force the correct sysroot via compile options. @@ -967,6 +994,31 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) endif() endif() + # The 1DS SDK creates GLOBAL imported targets 'z' and 'sqlite3' without setting + # IMPORTED_LOCATION, which causes link errors on cross-compile. For Android, + # the 1DS CMake now builds from bundled source. For other platforms, resolve + # the imported targets if possible. + if(NOT ANDROID) + if(TARGET z) + get_target_property(_z_loc z IMPORTED_LOCATION) + if(NOT _z_loc OR _z_loc STREQUAL "_z_loc-NOTFOUND") + find_library(_z_lib z) + if(_z_lib) + set_target_properties(z PROPERTIES IMPORTED_LOCATION "${_z_lib}") + endif() + endif() + endif() + if(TARGET sqlite3) + get_target_property(_sqlite3_loc sqlite3 IMPORTED_LOCATION) + if(NOT _sqlite3_loc OR _sqlite3_loc STREQUAL "_sqlite3_loc-NOTFOUND") + find_library(_sqlite3_lib sqlite3) + if(_sqlite3_lib) + set_target_properties(sqlite3 PROPERTIES IMPORTED_LOCATION "${_sqlite3_lib}") + endif() + endif() + endif() + endif() + set(BUILD_UNIT_TESTS "${BUILD_UNIT_TESTS_SAVED}" CACHE BOOL "" FORCE) set(BUILD_FUNC_TESTS "${BUILD_FUNC_TESTS_SAVED}" CACHE BOOL "" FORCE) set(BUILD_SAMPLES "${BUILD_SAMPLES_SAVED}" CACHE BOOL "" FORCE) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index d6fd592478109..4d3578285056a 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -94,9 +94,15 @@ std::string DeviceId::GetStorageDirectory(bool mobile) { if (!h || !h[0]) return ""; std::string home(h); +#if defined(__APPLE__) && TARGET_OS_IOS + if (mobile) { + return home + "/Library/Application Support/.onnxruntime"; + } +#else if (mobile) { return home + "/.onnxruntime"; } +#endif #if defined(__APPLE__) return home + "/Library/Application Support/" + kDeviceIdDir; diff --git a/onnxruntime/core/platform/posix/device_id.h b/onnxruntime/core/platform/posix/device_id.h index 89cbd0945e045..f54b24ba0572b 100644 --- a/onnxruntime/core/platform/posix/device_id.h +++ b/onnxruntime/core/platform/posix/device_id.h @@ -21,7 +21,8 @@ enum class DeviceIdStatus { * The device ID is stored in a platform-appropriate location: * - macOS: ~/Library/Application Support/Microsoft/DeveloperTools/.onnxruntime/deviceid * - Linux: ~/Microsoft/DeveloperTools/.onnxruntime/deviceid - * - iOS/Android: ~/.onnxruntime/deviceid (shorter path, avoids iCloud backup on iOS) + * - iOS: ~/Library/Application Support/.onnxruntime/deviceid + * - Android: ~/.onnxruntime/deviceid * * Thread-safe singleton - use DeviceId::Instance() to access. */ @@ -40,7 +41,8 @@ class DeviceId { // Get the directory path for device ID / telemetry cache storage // Desktop: ~/Microsoft/DeveloperTools/.onnxruntime (or platform equivalent) - // Mobile: ~/.onnxruntime + // iOS: ~/Library/Application Support/.onnxruntime + // Android: ~/.onnxruntime static std::string GetStorageDirectory(bool mobile = false); private: diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 2cf8476d09bea..79f145225c4c2 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -5,7 +5,8 @@ #include "core/platform/posix/device_id.h" // 1DS SDK -#include +#include +#include #include #include @@ -24,6 +25,7 @@ #include #include #include +#include #include "core/common/logging/logging.h" #include "core/common/status.h" @@ -31,16 +33,32 @@ using namespace Microsoft::Applications::Events; -// Instantiate the LogManager singleton (defines the static ILogManager* instance). -// Required because cpp_client_telemetry's LogManagerCX.cpp is not compiled in the FetchContent build. -LOGMANAGER_INSTANCE - namespace onnxruntime { // Static member initialization std::atomic PosixTelemetry::global_register_count_{0}; std::mutex PosixTelemetry::global_mutex_; +namespace { + +void CreateDirectoryTree(const std::string& path) { + if (path.empty()) { + return; + } + + size_t pos = path.find_last_of('/'); + if (pos != std::string::npos && pos > 0) { + CreateDirectoryTree(path.substr(0, pos)); + } + + if (mkdir(path.c_str(), 0755) != 0 && errno != EEXIST) { + LOGS_DEFAULT(WARNING) << "Failed to create telemetry cache directory '" << path + << "': errno=" << errno; + } +} + +} // namespace + // Tenant token for 1DS telemetry ingestion constexpr const char* TENANT_TOKEN = "5ad963bd4b3a4118a481401cc0211875-da8e8657-47d4-4ed7-ab39-7886e136f53b-6988"; @@ -58,13 +76,20 @@ class EventBuilder { EventProperties props_; public: - explicit EventBuilder(std::string event_name, EventPriority priority) + explicit EventBuilder(std::string event_name, EventPriority priority, + uint64_t privacy_tags = PDT_ProductAndServicePerformance) : props_(std::move(event_name)) { // Set latency/priority props_.SetLatency(static_cast(priority)); // Set schema version for compatibility with Windows props_.SetProperty("schemaVersion", static_cast(0)); + + // All ORT telemetry is required system metadata (no PII) + props_.SetLevel(DIAG_LEVEL_REQUIRED); + + // Privacy data tags for GDPR compliance classification + props_.SetProperty(COMMONFIELDS_EVENT_PRIVTAGS, static_cast(privacy_tags)); } EventBuilder& AddString(const char* key, const std::string& value) { @@ -146,7 +171,7 @@ class EventBuilder { EventBuilder& AddBatchSizeDurations(const std::unordered_map& durations) { for (const auto& [batch_size, duration] : durations) { std::string key = "batchSize_" + std::to_string(batch_size); - props_.SetProperty(key, duration); + props_.SetProperty(key, static_cast(duration)); } return *this; } @@ -211,14 +236,23 @@ void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventPropert void PosixTelemetry::Initialize() { std::lock_guard lock(mutex_); - // Configure 1DS SDK for optimal async performance - ILogConfiguration config; + // NOTE: On Android, the Java layer must be initialized before calling this: + // System.loadLibrary("maesdk"); + // new HttpClient(getApplicationContext()); + // OfflineRoom.connectContext(getApplicationContext()); // if using Room DB + // See cpp_client_telemetry/docs/cpp-start-android.md for details. + + // Create SDK configuration — stored as member because LogManagerImpl holds a reference + // and the configuration must remain valid for the lifetime of the log manager. + config_ = std::make_unique(); + auto& config = *config_; + config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown - // Configure cache for offline scenarios - use same directory as device ID storage + // Configure cache for offline scenarios — use same directory as device ID storage { #if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) constexpr bool is_mobile = true; @@ -227,7 +261,7 @@ void PosixTelemetry::Initialize() { #endif std::string cache_dir = DeviceId::GetStorageDirectory(is_mobile); if (!cache_dir.empty()) { - mkdir(cache_dir.c_str(), 0755); // Ensure directory exists for telemetry cache + CreateDirectoryTree(cache_dir); std::string cache_path = cache_dir + "/telemetry_cache.db"; config[CFG_STR_CACHE_FILE_PATH] = cache_path; } @@ -235,65 +269,79 @@ void PosixTelemetry::Initialize() { // Configure RAM queue for async batching config[CFG_INT_RAM_QUEUE_SIZE] = 512 * 1024; // 512KB RAM queue - config[CFG_INT_RAM_QUEUE_BUFFERS] = 3; // Triple buffering for smooth async operation - - // Create logger instance (raw pointer, owned by LogManager) - auto* raw_logger = LogManager::Initialize(TENANT_TOKEN, config); - // Store as shared_ptr with no-op deleter since LogManager owns the lifetime - logger_ = std::shared_ptr<::Microsoft::Applications::Events::ILogger>( - raw_logger, [](::Microsoft::Applications::Events::ILogger*) {}); - - if (logger_) { - // Set privacy level - no PII collection - logger_->SetContext("PrivacyLevel", "o:0"); - - // Set platform information as context - logger_->SetContext("Platform", GetPlatformInfo()); - - // Override the device ID with a hashed version for privacy. - // The "c:" prefix tells the backend it's a caller-supplied identifier. - auto* ctx = LogManager::GetSemanticContext(); - if (ctx) { - std::string raw_device_id; + + // Create log manager via LogManagerProvider (recommended for production use, + // per LogManager_Creation_and_Lifecycle_Management.md). + status_t status; + log_manager_ = LogManagerProvider::CreateLogManager(*config_, status); + if (status != STATUS_SUCCESS || !log_manager_) { + LOGS_DEFAULT(WARNING) << "Failed to create telemetry LogManager, status: " << status; + config_.reset(); + return; + } + + // Get logger for our tenant + logger_ = log_manager_->GetLogger(TENANT_TOKEN); + if (!logger_) { + LOGS_DEFAULT(WARNING) << "Failed to get telemetry logger"; + LogManagerProvider::Release(*config_); + log_manager_ = nullptr; + config_.reset(); + return; + } + + // Use BEST_EFFORT transmit profile to minimize battery and network impact. + // Events are batched and uploaded at a lower cadence. + log_manager_->SetTransmitProfile(TransmitProfile_BestEffort); + + // Override device ID with hashed version for privacy. + // The "c:" prefix tells the backend it's a caller-supplied identifier. + auto& ctx = log_manager_->GetSemanticContext(); + std::string raw_device_id; #if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) - // Mobile: read the SDK's auto-generated platform device ID (e.g., identifierForVendor - // on iOS, ANDROID_ID on Android) and hash it before sending. - auto* provider = static_cast(ctx); - auto& fields = provider->GetCommonFields(); - auto it = fields.find(COMMONFIELDS_DEVICE_ID); - if (it != fields.end()) { - raw_device_id = it->second.to_string(); - } + // Mobile: read SDK's auto-generated platform device ID (e.g., identifierForVendor + // on iOS, ANDROID_ID on Android) and hash it before sending. + auto* provider = static_cast(&ctx); + auto& fields = provider->GetCommonFields(); + auto it = fields.find(COMMONFIELDS_DEVICE_ID); + if (it != fields.end()) { + raw_device_id = it->second.to_string(); + } #else - // Desktop: use our custom persistent UUID. - raw_device_id = DeviceId::Instance().GetValue(); + // Desktop: use our custom persistent UUID. + raw_device_id = DeviceId::Instance().GetValue(); #endif - if (!raw_device_id.empty()) { - ctx->SetDeviceId("c:" + HashDeviceId(raw_device_id)); - } - } + if (!raw_device_id.empty()) { + ctx.SetDeviceId("c:" + HashDeviceId(raw_device_id)); + } - // Set application information - logger_->SetContext("AppName", "ONNXRuntime"); - logger_->SetContext("AppVersion", ORT_VERSION); + // Set application information as logger context (attached to all events) + logger_->SetContext("AppName", "ONNXRuntime"); + logger_->SetContext("AppVersion", ORT_VERSION); + logger_->SetContext("Platform", GetPlatformInfo()); - enabled_ = true; - } + enabled_ = true; } void PosixTelemetry::Shutdown() { std::lock_guard lock(mutex_); - if (logger_) { - // Disable logging first to prevent new events - enabled_ = false; + // Disable logging first to prevent new events during shutdown + enabled_ = false; + logger_ = nullptr; // Owned by log_manager_, will be destroyed with it - // Clear our pointer (owned by LogManager, not us) - logger_ = nullptr; + if (log_manager_ && config_) { + // Per SDK use-after-free docs (use-after-free.md): + // Flush() must be called before FlushAndTeardown() to ensure all pending + // events are persisted to offline storage. FlushAndTeardown() internally + // calls PauseActivity() + WaitPause() to quiesce the SDK. + log_manager_->Flush(); + log_manager_->FlushAndTeardown(); - // Now safely call FlushAndTeardown - // This will block until all pending events are sent or timeout - LogManager::FlushAndTeardown(); + // Release the log manager instance via LogManagerProvider + LogManagerProvider::Release(*config_); + log_manager_ = nullptr; + config_.reset(); } } @@ -492,7 +540,8 @@ void PosixTelemetry::LogProcessInfo() const { return; } - auto builder = EventBuilder("ProcessInfo", EventPriority::CRITICAL) + auto builder = EventBuilder("ProcessInfo", EventPriority::CRITICAL, + PDT_DeviceConnectivityAndConfiguration | PDT_SoftwareSetupAndInventory) .AddCommonContext(this) .AddString("runtimeVersion", ORT_VERSION) #if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) @@ -515,7 +564,8 @@ void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { return; } - auto event = EventBuilder("SessionCreationStart", EventPriority::CRITICAL) + auto event = EventBuilder("SessionCreationStart", EventPriority::CRITICAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .Build(); @@ -528,7 +578,8 @@ void PosixTelemetry::LogEvaluationStop(uint32_t session_id) const { return; } - auto event = EventBuilder("EvaluationStop", EventPriority::NORMAL) + auto event = EventBuilder("EvaluationStop", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .Build(); @@ -544,7 +595,8 @@ void PosixTelemetry::LogEvaluationStart(uint32_t session_id) const { return; } - auto event = EventBuilder("EvaluationStart", EventPriority::NORMAL) + auto event = EventBuilder("EvaluationStart", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .Build(); @@ -575,7 +627,8 @@ void PosixTelemetry::LogSessionCreation( // (LogAllSessions). Kept here for future compatibility if a similar mechanism is added for POSIX. std::string event_name = captureState ? "SessionCreation_CaptureState" : "SessionCreation"; - auto builder = EventBuilder(std::move(event_name), EventPriority::CRITICAL) + auto builder = EventBuilder(std::move(event_name), EventPriority::CRITICAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt64("irVersion", ir_version) @@ -609,7 +662,8 @@ void PosixTelemetry::LogCompileModelStart( return; } - auto event = EventBuilder("CompileModelStart", EventPriority::NORMAL) + auto event = EventBuilder("CompileModelStart", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddString("inputSource", input_source) @@ -634,7 +688,8 @@ void PosixTelemetry::LogCompileModelComplete( return; } - auto event = EventBuilder("CompileModelComplete", EventPriority::NORMAL) + auto event = EventBuilder("CompileModelComplete", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddBool("success", success) @@ -653,7 +708,8 @@ void PosixTelemetry::LogRuntimeError( return; } - auto event = EventBuilder("RuntimeError", EventPriority::HIGH) + auto event = EventBuilder("RuntimeError", EventPriority::HIGH, + PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt32("errorCode", static_cast(status.Code())) @@ -670,12 +726,13 @@ void PosixTelemetry::LogRuntimeError( void PosixTelemetry::LogRuntimePerf( uint32_t session_id, uint32_t total_runs_since_last, int64_t total_run_duration_since_last, - std::unordered_map duration_per_batch_size) const { + const std::unordered_map& duration_per_batch_size) const { if (!enabled_ || !logger_) { return; } - auto event = EventBuilder("RuntimePerf", EventPriority::NORMAL) + auto event = EventBuilder("RuntimePerf", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddUInt32("totalRunsSinceLast", total_runs_since_last) @@ -709,7 +766,8 @@ void PosixTelemetry::LogAutoEpSelection( return; } - auto event = EventBuilder("EpAutoSelection", EventPriority::NORMAL) + auto event = EventBuilder("EpAutoSelection", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddString("selectionPolicy", selection_policy) @@ -730,7 +788,8 @@ void PosixTelemetry::LogProviderOptions( std::string event_name = captureState ? "ProviderOptions_CaptureState" : "ProviderOptions"; - auto event = EventBuilder(std::move(event_name), EventPriority::NORMAL) + auto event = EventBuilder(std::move(event_name), EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory) .AddCommonContext(this) .AddString("providerId", provider_id) .AddString("providerOptions", provider_options_string) @@ -753,7 +812,8 @@ void PosixTelemetry::LogSystemMetrics(uint32_t session_id) const { int64_t max_rss_kb = usage.ru_maxrss; #endif - auto event = EventBuilder("SystemMetrics", EventPriority::NORMAL) + auto event = EventBuilder("SystemMetrics", EventPriority::NORMAL, + PDT_ProductAndServicePerformance | PDT_DeviceConnectivityAndConfiguration) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt64("maxRssKb", max_rss_kb) diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 0f5c0a949b07c..4c5f7cf6b87e3 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -13,6 +13,8 @@ // Forward declarations of 1DS SDK types namespace Microsoft::Applications::Events { class ILogger; +class ILogManager; +class ILogConfiguration; class EventProperties; } // namespace Microsoft::Applications::Events @@ -81,7 +83,7 @@ class PosixTelemetry : public Telemetry { void LogRuntimePerf(uint32_t session_id, uint32_t total_runs_since_last, int64_t total_run_duration_since_last, - std::unordered_map duration_per_batch_size) const override; + const std::unordered_map& duration_per_batch_size) const override; void LogExecutionProviderEvent(LUID* adapterLuid) const override; void LogDriverInfoEvent(const std::string_view device_class, @@ -122,8 +124,13 @@ class PosixTelemetry : public Telemetry { // Mutex for thread-safe access mutable std::mutex mutex_; - // Telemetry SDK logger instance (1DS) - std::shared_ptr<::Microsoft::Applications::Events::ILogger> logger_; + // Telemetry SDK instances. + // log_manager_ is owned by LogManagerProvider; logger_ is owned by log_manager_. + ::Microsoft::Applications::Events::ILogManager* log_manager_ = nullptr; + ::Microsoft::Applications::Events::ILogger* logger_ = nullptr; + + // SDK configuration — must outlive log_manager_ (LogManagerImpl holds a reference). + std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> config_; // State tracking mutable std::atomic enabled_{true}; From 415ca61effd726dde28f3aeb3d7ec5f4c0d4fcd0 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 10 Jun 2026 23:39:39 -0500 Subject: [PATCH 08/61] Add POSIX (1DS) telemetry support Implement PosixTelemetry using the Microsoft 1DS SDK (cpp_client_telemetry) for non-Windows platforms (Linux/macOS/Android/iOS), add a posix device_id helper, wire PosixTelemetry into PosixEnv, and unify the --use_telemetry build flag across platforms (ETW on Windows, 1DS elsewhere). Also adds a Windows svchost -s/-k service-name fallback in windows/telemetry.cc. Squashed from bhamehta/posix-telemetry (commits: Add POSIX telemetry; Lint; Modify device id and other events; Service fallback Windows, fix Posix telemetry, add dep) and rebased onto current origin/main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- ThirdPartyNotices.txt | 222 +++++ build.sh | 2 +- cgmanifests/cgmanifest.json | 10 + cmake/CMakeLists.txt | 3 + cmake/deps.txt | 2 + .../external/onnxruntime_external_deps.cmake | 70 +- cmake/onnxruntime_1ds_telemetry.cmake | 35 + cmake/onnxruntime_common.cmake | 52 +- onnxruntime/core/platform/posix/device_id.cc | 190 ++++ onnxruntime/core/platform/posix/device_id.h | 71 ++ onnxruntime/core/platform/posix/env.cc | 12 +- onnxruntime/core/platform/posix/telemetry.cc | 810 ++++++++++++++++++ onnxruntime/core/platform/posix/telemetry.h | 152 ++++ .../core/platform/windows/telemetry.cc | 44 +- tools/ci_build/build.py | 6 +- tools/ci_build/build_args.py | 5 +- 16 files changed, 1676 insertions(+), 10 deletions(-) create mode 100644 cmake/onnxruntime_1ds_telemetry.cmake create mode 100644 onnxruntime/core/platform/posix/device_id.cc create mode 100644 onnxruntime/core/platform/posix/device_id.h create mode 100644 onnxruntime/core/platform/posix/telemetry.cc create mode 100644 onnxruntime/core/platform/posix/telemetry.h diff --git a/ThirdPartyNotices.txt b/ThirdPartyNotices.txt index fbd9f9a95f601..5a193f3f16888 100644 --- a/ThirdPartyNotices.txt +++ b/ThirdPartyNotices.txt @@ -6119,3 +6119,225 @@ Redistribution and use in source and binary forms, with or without modification, 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Copyright (c) 2026 KleidiAi. + +Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +_____ + +microsoft/cpp_client_telemetry, https://github.com/microsoft/cpp_client_telemetry/ + +Apache License + +Copyright (c) Microsoft Corporation. All rights reserved. + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright 2026 Microsoft Corporation + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. \ No newline at end of file diff --git a/build.sh b/build.sh index bf799ac8b7211..2778cb5c53ef3 100755 --- a/build.sh +++ b/build.sh @@ -18,4 +18,4 @@ elif [[ "$*" == *"--android"* ]]; then DIR_OS="Android" fi -python3 $DIR/tools/ci_build/build.py --build_dir $DIR/build/$DIR_OS "$@" +python3 $DIR/tools/ci_build/build.py --build_dir $DIR/build/$DIR_OS --use_telemetry "$@" diff --git a/cgmanifests/cgmanifest.json b/cgmanifests/cgmanifest.json index bf889e9fb61a8..dfe6f0d4d1553 100644 --- a/cgmanifests/cgmanifest.json +++ b/cgmanifests/cgmanifest.json @@ -345,6 +345,16 @@ }, "comments": "python-pillow. Implementation logic for anti-aliasing copied by Resize CPU kernel." } + }, + { + "component": { + "type": "git", + "git": { + "commitHash": "ee2ded25e539f64052c9d8635bef4ea62c30e014", + "repositoryUrl": "https://github.com/microsoft/cpp_client_telemetry.git" + }, + "comments": "1DS SDK (cpp_client_telemetry) for cross-platform telemetry on non-Windows platforms (macOS, Linux, Android, iOS)." + } } ], "Version": 1 diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt index 28d6aa83c5343..8523b45141ec8 100644 --- a/cmake/CMakeLists.txt +++ b/cmake/CMakeLists.txt @@ -589,6 +589,9 @@ set(ONNXRUNTIME_INCLUDE_DIR ${REPO_ROOT}/include/onnxruntime) list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/external) include(external/onnxruntime_external_deps.cmake) +# 1DS telemetry integration for non-Windows platforms (must come after external deps) +include(onnxruntime_1ds_telemetry.cmake) + set(ORT_WARNING_FLAGS) if (WIN32) # class needs to have dll-interface to be used by clients diff --git a/cmake/deps.txt b/cmake/deps.txt index e303ccd9f8a98..942ca9f1b2217 100644 --- a/cmake/deps.txt +++ b/cmake/deps.txt @@ -64,3 +64,5 @@ kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.20.0.tar. # this entry will be updated to use refs/tags/ instead of the raw commit hash. kleidiai-qmx;https://github.com/qualcomm/kleidiai/archive/2f10c9a8d32f81ffeeb6d4885a29cc35d2b0da87.zip;5e855730a2d69057a569f43dd7532db3b2d2a05c vulkan_headers;https://codeload.github.com/KhronosGroup/Vulkan-Headers/tar.gz/refs/tags/v1.4.344;57bc528ef7c4a3f7bfbb59e64a187e3734bd29d8 +# cpp_client_telemetry (1DS SDK) for cross-platform telemetry on non-Windows platforms +cpp_client_telemetry;https://github.com/microsoft/cpp_client_telemetry/archive/refs/tags/v3.10.40.1.zip;ee2ded25e539f64052c9d8635bef4ea62c30e014 diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index 1a1e4921a41e6..9a91f4a9ab145 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -7,8 +7,9 @@ include(external/helper_functions.cmake) file(STRINGS deps.txt ONNXRUNTIME_DEPS_LIST) foreach(ONNXRUNTIME_DEP IN LISTS ONNXRUNTIME_DEPS_LIST) - # Lines start with "#" are comments - if(NOT ONNXRUNTIME_DEP MATCHES "^#") + # Lines start with "#" are comments, so skip them. + # cpp_client_telemetry is only needed for telemetry on non-Windows platforms, so skip if telemetry is not enabled or it's Windows platform. + if((NOT ONNXRUNTIME_DEP MATCHES "^#") AND ((NOT ONNXRUNTIME_DEP MATCHES "^cpp_client_telemetry") OR (onnxruntime_USE_TELEMETRY AND NOT WIN32))) # The first column is name list(POP_FRONT ONNXRUNTIME_DEP ONNXRUNTIME_DEP_NAME) # The second column is URL @@ -900,6 +901,71 @@ if(onnxruntime_USE_SNPE) list(APPEND onnxruntime_EXTERNAL_LIBRARIES ${SNPE_NN_LIBS}) endif() +# 1DS SDK (cpp_client_telemetry) for cross-platform telemetry on non-Windows platforms +if(onnxruntime_USE_TELEMETRY AND NOT WIN32) + set(BUILD_UNIT_TESTS_SAVED "${BUILD_UNIT_TESTS}") + set(BUILD_FUNC_TESTS_SAVED "${BUILD_FUNC_TESTS}") + set(BUILD_SAMPLES_SAVED "${BUILD_SAMPLES}") + set(BUILD_UNIT_TESTS OFF CACHE BOOL "Disable 1DS SDK unit tests" FORCE) + set(BUILD_FUNC_TESTS OFF CACHE BOOL "Disable 1DS SDK functional tests" FORCE) + set(BUILD_SAMPLES OFF CACHE BOOL "Disable 1DS SDK samples" FORCE) + + onnxruntime_fetchcontent_declare( + cpp_client_telemetry + URL ${DEP_URL_cpp_client_telemetry} + URL_HASH SHA1=${DEP_SHA1_cpp_client_telemetry} + EXCLUDE_FROM_ALL + ) + onnxruntime_fetchcontent_makeavailable(cpp_client_telemetry) + + # cpp_client_telemetry's CMakeLists.txt uses include_directories(${CMAKE_SOURCE_DIR}) to find + # its bundled nlohmann/, sqlite/, and zlib/ headers. When built via FetchContent, CMAKE_SOURCE_DIR + # points to ORT's root instead. Fix by adding the actual source dir as an include path. + if(TARGET mat) + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}) + # Also add subdirectories for bundled headers (sqlite3.h, zlib.h) that are included without path prefix + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/sqlite) + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/zlib) + # ORT enables -ffast-math globally, which conflicts with std::numeric_limits::infinity() + # in the 1DS SDK's bundled nlohmann/json.hpp. Re-enable finite math to fix. + # Also suppress warnings in the 1DS SDK code that are treated as errors. + target_compile_options(mat PRIVATE + -fno-finite-math-only + -Wno-unused-const-variable + $<$:-Wno-reorder> + $<$:-Wno-reorder-ctor> + ) + endif() + + # The 1DS SDK creates GLOBAL imported targets 'z' and 'sqlite3' without setting IMPORTED_LOCATION, + # which causes link errors on cross-compile. For Android, the 1DS cmake now builds from bundled source. + # For other platforms, resolve the imported targets if possible. + if(NOT ANDROID) + if(TARGET z) + get_target_property(_z_loc z IMPORTED_LOCATION) + if(NOT _z_loc OR _z_loc STREQUAL "_z_loc-NOTFOUND") + find_package(ZLIB QUIET) + if(ZLIB_FOUND) + set_target_properties(z PROPERTIES IMPORTED_LOCATION "${ZLIB_LIBRARIES}") + endif() + endif() + endif() + if(TARGET sqlite3) + get_target_property(_sqlite3_loc sqlite3 IMPORTED_LOCATION) + if(NOT _sqlite3_loc OR _sqlite3_loc STREQUAL "_sqlite3_loc-NOTFOUND") + find_library(_sqlite3_lib sqlite3) + if(_sqlite3_lib) + set_target_properties(sqlite3 PROPERTIES IMPORTED_LOCATION "${_sqlite3_lib}") + endif() + endif() + endif() + endif() + + set(BUILD_UNIT_TESTS "${BUILD_UNIT_TESTS_SAVED}" CACHE BOOL "" FORCE) + set(BUILD_FUNC_TESTS "${BUILD_FUNC_TESTS_SAVED}" CACHE BOOL "" FORCE) + set(BUILD_SAMPLES "${BUILD_SAMPLES_SAVED}" CACHE BOOL "" FORCE) +endif() + FILE(TO_NATIVE_PATH ${CMAKE_BINARY_DIR} ORT_BINARY_DIR) FILE(TO_NATIVE_PATH ${PROJECT_SOURCE_DIR} ORT_SOURCE_DIR) diff --git a/cmake/onnxruntime_1ds_telemetry.cmake b/cmake/onnxruntime_1ds_telemetry.cmake new file mode 100644 index 0000000000000..20a73f2139b09 --- /dev/null +++ b/cmake/onnxruntime_1ds_telemetry.cmake @@ -0,0 +1,35 @@ +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +# This file handles telemetry integration for non-Windows platforms +# (macOS, Linux, Android, iOS) using the 1DS SDK (cpp_client_telemetry). +# The SDK is fetched via FetchContent in onnxruntime_external_deps.cmake. + +if(onnxruntime_USE_TELEMETRY AND NOT WIN32) + if(NOT TARGET mat) + message(FATAL_ERROR "Telemetry enabled for non-Windows but 'mat' target not found. " + "Ensure cpp_client_telemetry is fetched in onnxruntime_external_deps.cmake.") + endif() + + message(STATUS "Enabling 1DS telemetry for non-Windows platforms") + + # Add compile definition so C++ code can detect 1DS telemetry at compile time + add_compile_definitions(USE_1DS_TELEMETRY) + + # Platform-specific status messages + if(APPLE) + if(CMAKE_SYSTEM_NAME STREQUAL "iOS") + message(STATUS " Platform: iOS") + else() + message(STATUS " Platform: macOS") + endif() + elseif(ANDROID) + message(STATUS " Platform: Android") + elseif(UNIX) + message(STATUS " Platform: Linux") + endif() +else() + if(NOT onnxruntime_USE_TELEMETRY) + message(STATUS "Telemetry is disabled (use -Donnxruntime_USE_TELEMETRY=ON to enable)") + endif() +endif() diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index b081e22e8b3f4..3e55a56a74284 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -55,6 +55,16 @@ else() "${ONNXRUNTIME_ROOT}/core/platform/posix/stacktrace.cc" ) + # Telemetry for non-Windows platforms (enabled by USE_TELEMETRY) + if (onnxruntime_USE_TELEMETRY) + list(APPEND onnxruntime_common_src_patterns + "${ONNXRUNTIME_ROOT}/core/platform/posix/device_id.h" + "${ONNXRUNTIME_ROOT}/core/platform/posix/device_id.cc" + "${ONNXRUNTIME_ROOT}/core/platform/posix/telemetry.h" + "${ONNXRUNTIME_ROOT}/core/platform/posix/telemetry.cc" + ) + endif() + # logging files if (onnxruntime_USE_SYSLOG) list(APPEND onnxruntime_common_src_patterns @@ -139,7 +149,11 @@ if(NOT WIN32 AND NOT APPLE AND NOT ANDROID AND CMAKE_SYSTEM_PROCESSOR MATCHES "x endif() if (onnxruntime_USE_TELEMETRY) - set_target_properties(onnxruntime_common PROPERTIES COMPILE_FLAGS "/FI${ONNXRUNTIME_INCLUDE_DIR}/core/platform/windows/TraceLoggingConfigPrivate.h") + if(WIN32) + set_target_properties(onnxruntime_common PROPERTIES COMPILE_FLAGS "/FI${ONNXRUNTIME_INCLUDE_DIR}/core/platform/windows/TraceLoggingConfigPrivate.h") + else() + target_compile_definitions(onnxruntime_common PRIVATE USE_1DS_TELEMETRY) + endif() endif() if (onnxruntime_USE_MIMALLOC) list(APPEND onnxruntime_EXTERNAL_LIBRARIES mimalloc-static) @@ -201,6 +215,42 @@ if(CPUINFO_SUPPORTED) list(APPEND onnxruntime_EXTERNAL_LIBRARIES cpuinfo::cpuinfo) endif() +# Link telemetry library (1DS SDK) for non-Windows platforms +if(onnxruntime_USE_TELEMETRY AND NOT WIN32) + if(TARGET mat) + target_link_libraries(onnxruntime_common PRIVATE mat) + # cpp_client_telemetry uses include_directories() (directory-scoped) rather than + # target_include_directories(), so include paths don't propagate via target_link_libraries. + # Add them explicitly for onnxruntime_common. + if(DEFINED cpp_client_telemetry_SOURCE_DIR) + target_include_directories(onnxruntime_common PRIVATE + ${cpp_client_telemetry_SOURCE_DIR}/lib/include/public + ${cpp_client_telemetry_SOURCE_DIR}/lib/include/mat + ${cpp_client_telemetry_SOURCE_DIR}/lib + ) + endif() + # Platform-specific system libraries required by the 1DS SDK + if(APPLE) + target_link_libraries(onnxruntime_common PRIVATE + "-framework CoreFoundation" + "-framework Security" + z + sqlite3 + ) + elseif(ANDROID) + target_link_libraries(onnxruntime_common PRIVATE z log) + elseif(UNIX) + target_link_libraries(onnxruntime_common PRIVATE + curl + z + sqlite3 + ) + endif() + else() + message(WARNING "Telemetry enabled but 'mat' library target not found") + endif() +endif() + if (NOT onnxruntime_BUILD_SHARED_LIB) install(DIRECTORY ${PROJECT_SOURCE_DIR}/../include/onnxruntime/core/common DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/onnxruntime/core) install(TARGETS onnxruntime_common EXPORT ${PROJECT_NAME}Targets diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc new file mode 100644 index 0000000000000..d6fd592478109 --- /dev/null +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -0,0 +1,190 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#include "core/platform/posix/device_id.h" + +#include +#include +#include +#include +#include + +#include +#include + +#ifdef __APPLE__ +#include +#endif + +namespace onnxruntime { + +DeviceId& DeviceId::Instance() { + static DeviceId instance; + return instance; +} + +std::string DeviceId::GetValue() { + std::lock_guard lock(mutex_); + InitializeInternal(); + return device_id_; +} + +DeviceIdStatus DeviceId::GetStatus() { + std::lock_guard lock(mutex_); + InitializeInternal(); + return status_; +} + +std::string DeviceId::GetStatusString() { + switch (GetStatus()) { + case DeviceIdStatus::New: + return "New"; + case DeviceIdStatus::Existing: + return "Existing"; + case DeviceIdStatus::Corrupted: + return "Corrupted"; + case DeviceIdStatus::Failed: + return "Failed"; + default: + return "Unknown"; + } +} + +std::string DeviceId::GenerateUUID() { + std::random_device rd; + std::mt19937 gen(rd()); + std::uniform_int_distribution dist(0, UINT32_MAX); + + uint32_t data1 = dist(gen); + uint16_t data2 = static_cast(dist(gen) & 0xFFFF); + uint16_t data3 = static_cast((dist(gen) & 0x0FFF) | 0x4000); // Version 4 + uint16_t data4 = static_cast((dist(gen) & 0x3FFF) | 0x8000); // Variant 1 + uint16_t data5a = static_cast(dist(gen) & 0xFFFF); + uint32_t data5b = dist(gen); + + std::ostringstream oss; + oss << std::hex << std::setfill('0') + << std::setw(8) << data1 << '-' + << std::setw(4) << data2 << '-' + << std::setw(4) << data3 << '-' + << std::setw(4) << data4 << '-' + << std::setw(4) << data5a + << std::setw(8) << data5b; + return oss.str(); +} + +bool DeviceId::IsValidGUID(const std::string& str) { + if (str.length() != 36) return false; + + for (size_t i = 0; i < str.length(); ++i) { + char c = str[i]; + if (i == 8 || i == 13 || i == 18 || i == 23) { + if (c != '-') return false; + } else { + if (!((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F'))) { + return false; + } + } + } + return true; +} + +std::string DeviceId::GetStorageDirectory(bool mobile) { + const char* h = std::getenv("HOME"); + if (!h || !h[0]) return ""; + std::string home(h); + + if (mobile) { + return home + "/.onnxruntime"; + } + +#if defined(__APPLE__) + return home + "/Library/Application Support/" + kDeviceIdDir; +#else + return home + "/" + kDeviceIdDir; +#endif +} + +void DeviceId::CreateDirectoryTree(const std::string& path) { + if (path.empty()) return; + + size_t pos = path.find_last_of('/'); + if (pos != std::string::npos && pos > 0) { + CreateDirectoryTree(path.substr(0, pos)); + } + + mkdir(path.c_str(), 0755); +} + +void DeviceId::InitializeInternal() { + if (initialized_) return; + initialized_ = true; + + try { + // Use compile-time platform detection to select the appropriate storage path. + // This matches the mobile/desktop selection in posix/env.cc. +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) + constexpr bool is_mobile = true; +#else + constexpr bool is_mobile = false; +#endif + std::string dir_path = GetStorageDirectory(is_mobile); + if (dir_path.empty()) { + status_ = DeviceIdStatus::Failed; + return; + } + + std::string file_path = dir_path + "/" + kFileName; + + // Try to read existing device ID + { + std::ifstream infile(file_path); + if (infile.good()) { + infile.seekg(0, std::ios::end); + auto size = infile.tellg(); + infile.seekg(0, std::ios::beg); + + if (size > static_cast(kMaxFileSize)) { + status_ = DeviceIdStatus::Corrupted; + } else { + std::string content; + std::getline(infile, content); + + // Trim whitespace + while (!content.empty() && + (content.back() == '\n' || content.back() == '\r' || content.back() == ' ')) { + content.pop_back(); + } + + if (IsValidGUID(content)) { + device_id_ = content; + status_ = DeviceIdStatus::Existing; + return; + } + status_ = DeviceIdStatus::Corrupted; + } + } + } + + // Generate new device ID + device_id_ = GenerateUUID(); + + // Create directory tree + CreateDirectoryTree(dir_path); + + // Write to file + std::ofstream outfile(file_path); + if (outfile.good()) { + outfile << device_id_; + outfile.close(); + status_ = DeviceIdStatus::New; + } else { + status_ = DeviceIdStatus::Failed; + } + } catch (...) { + status_ = DeviceIdStatus::Failed; + // Keep device_id_ if generated — it's still valid for this session (in-memory only). + } +} + +} // namespace onnxruntime diff --git a/onnxruntime/core/platform/posix/device_id.h b/onnxruntime/core/platform/posix/device_id.h new file mode 100644 index 0000000000000..89cbd0945e045 --- /dev/null +++ b/onnxruntime/core/platform/posix/device_id.h @@ -0,0 +1,71 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#pragma once + +#include +#include +#include "core/common/common.h" + +namespace onnxruntime { + +enum class DeviceIdStatus { + New, // Device ID was newly generated + Existing, // Device ID was loaded from persistent storage + Corrupted, // Stored device ID was invalid and regenerated + Failed // Failed to persist device ID (in-memory only) +}; + +/** + * Manages a persistent device identifier for telemetry purposes. + * The device ID is stored in a platform-appropriate location: + * - macOS: ~/Library/Application Support/Microsoft/DeveloperTools/.onnxruntime/deviceid + * - Linux: ~/Microsoft/DeveloperTools/.onnxruntime/deviceid + * - iOS/Android: ~/.onnxruntime/deviceid (shorter path, avoids iCloud backup on iOS) + * + * Thread-safe singleton - use DeviceId::Instance() to access. + */ +class DeviceId { + public: + static DeviceId& Instance(); + + // Get the device ID value (generates/loads on first call) + std::string GetValue(); + + // Get the status of the device ID + DeviceIdStatus GetStatus(); + + // Get human-readable status string + std::string GetStatusString(); + + // Get the directory path for device ID / telemetry cache storage + // Desktop: ~/Microsoft/DeveloperTools/.onnxruntime (or platform equivalent) + // Mobile: ~/.onnxruntime + static std::string GetStorageDirectory(bool mobile = false); + + private: + DeviceId() = default; + ~DeviceId() = default; + ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(DeviceId); + + void InitializeInternal(); + + // Generate a random UUID v4 + static std::string GenerateUUID(); + + // Validate GUID format (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + static bool IsValidGUID(const std::string& str); + + // Create directory tree recursively using platform APIs + static void CreateDirectoryTree(const std::string& path); + + static constexpr const char* kDeviceIdDir = "Microsoft/DeveloperTools/.onnxruntime"; + static constexpr const char* kFileName = "deviceid"; + static constexpr size_t kMaxFileSize = 256; + + std::string device_id_; + DeviceIdStatus status_ = DeviceIdStatus::New; + bool initialized_ = false; + std::mutex mutex_; +}; +} // namespace onnxruntime diff --git a/onnxruntime/core/platform/posix/env.cc b/onnxruntime/core/platform/posix/env.cc index 0270bf9d4d79c..96964b8beeb5f 100644 --- a/onnxruntime/core/platform/posix/env.cc +++ b/onnxruntime/core/platform/posix/env.cc @@ -16,6 +16,14 @@ limitations under the License. #include "core/platform/env.h" +#ifdef __APPLE__ +#include +#endif + +#ifdef USE_1DS_TELEMETRY +#include "core/platform/posix/telemetry.h" +#endif + #include #include #include @@ -639,7 +647,9 @@ class PosixEnv : public Env { } private: - Telemetry telemetry_provider_; +#ifdef USE_1DS_TELEMETRY + PosixTelemetry telemetry_provider_; +#endif #ifdef ORT_USE_CPUINFO PosixEnv() { cpuinfo_available_ = cpuinfo_initialize(); diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc new file mode 100644 index 0000000000000..6eb5dfb11ddef --- /dev/null +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -0,0 +1,810 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#include "core/platform/posix/telemetry.h" +#include "core/platform/posix/device_id.h" + +// 1DS SDK +#include +#include +#include + +#include +#include + +#ifdef __APPLE__ +#include +#include +#endif + +#if defined(__linux__) || defined(__ANDROID__) +#include +#endif + +#include +#include +#include + +#include "core/common/logging/logging.h" +#include "core/common/status.h" +#include "onnxruntime_config.h" + +using namespace Microsoft::Applications::Events; + +namespace onnxruntime { + +// Static member initialization +std::atomic PosixTelemetry::global_register_count_{0}; +std::mutex PosixTelemetry::global_mutex_; + +// Tenant token for 1DS telemetry ingestion +constexpr const char* TENANT_TOKEN = "5ad963bd4b3a4118a481401cc0211875-da8e8657-47d4-4ed7-ab39-7886e136f53b-6988"; + +// Event priority mapping (1DS priorities) +enum class EventPriority { + NORMAL = EventLatency_Normal, // Most events + HIGH = EventLatency_RealTime, // RuntimeError + CRITICAL = EventLatency_RealTime // ProcessInfo, SessionCreation +}; + +// Helper class to build events with common properties +class EventBuilder { + private: + EventProperties props_; + + public: + explicit EventBuilder(std::string event_name, EventPriority priority, + uint64_t privacy_tags = PDT_ProductAndServicePerformance) + : props_(std::move(event_name)) { + // Set latency/priority + props_.SetLatency(static_cast(priority)); + + // Set schema version for compatibility with Windows + props_.SetProperty("schemaVersion", static_cast(0)); + + // All ORT telemetry is required system metadata (no PII) + props_.SetLevel(DIAG_LEVEL_REQUIRED); + + // Privacy data tags for GDPR compliance classification + props_.SetProperty(COMMONFIELDS_EVENT_PRIVTAGS, static_cast(privacy_tags)); + } + + EventBuilder& AddString(const char* key, const std::string& value) { + if (!value.empty()) { + props_.SetProperty(key, value); + } + return *this; + } + + EventBuilder& AddInt32(const char* key, int32_t value) { + props_.SetProperty(key, static_cast(value)); + return *this; + } + + EventBuilder& AddInt64(const char* key, int64_t value) { + props_.SetProperty(key, value); + return *this; + } + + EventBuilder& AddBool(const char* key, bool value) { + props_.SetProperty(key, value); + return *this; + } + + EventBuilder& AddUInt32(const char* key, uint32_t value) { + props_.SetProperty(key, static_cast(value)); + return *this; + } + + EventBuilder& AddDouble(const char* key, double value) { + props_.SetProperty(key, value); + return *this; + } + + // Helper for vector to comma-separated string + EventBuilder& AddStringList(const char* key, const std::vector& vec) { + if (!vec.empty()) { + std::string result; + for (size_t i = 0; i < vec.size(); ++i) { + if (i > 0) result += ','; + result += vec[i]; + } + props_.SetProperty(key, result); + } + return *this; + } + + // Helper for map to key=value,key=value format + EventBuilder& AddIntMap(const char* key, const std::unordered_map& map) { + if (!map.empty()) { + std::string result; + bool first = true; + for (const auto& [k, v] : map) { + if (!first) result += ','; + result += k + '=' + std::to_string(v); + first = false; + } + props_.SetProperty(key, result); + } + return *this; + } + + // Helper for string map + EventBuilder& AddStringMap(const char* key, const std::unordered_map& map) { + if (!map.empty()) { + std::string result; + bool first = true; + for (const auto& [k, v] : map) { + if (!first) result += ','; + result += k + '=' + v; + first = false; + } + props_.SetProperty(key, result); + } + return *this; + } + + // Helper for batch size duration map + EventBuilder& AddBatchSizeDurations(const std::unordered_map& durations) { + for (const auto& [batch_size, duration] : durations) { + std::string key = "batchSize_" + std::to_string(batch_size); + props_.SetProperty(key, static_cast(duration)); + } + return *this; + } + + // Add common platform/device context + EventBuilder& AddCommonContext(const PosixTelemetry* telemetry) { + props_.SetProperty("projection", static_cast(telemetry->projection_.load())); + return *this; + } + + EventProperties Build() { return std::move(props_); } +}; + +// Hash a device ID string using std::hash and format as fixed-width hex. +// Ensures raw device identifiers are never sent over the wire. +static std::string HashDeviceId(const std::string& id) { + size_t hash = std::hash{}(id); + std::ostringstream oss; + oss << std::hex << std::setfill('0') << std::setw(sizeof(size_t) * 2) << hash; + return oss.str(); +} + +PosixTelemetry::PosixTelemetry() { + std::lock_guard lock(global_mutex_); + + // Always increment so destructor pairing is symmetric + global_register_count_++; + + if (global_register_count_ == 1) { + try { + Initialize(); + } catch (const std::exception& ex) { + // Log error but don't fail construction + // Telemetry failures should not break application functionality + LOGS_DEFAULT(WARNING) << "Failed to initialize telemetry: " << ex.what(); + } + } +} + +PosixTelemetry::~PosixTelemetry() { + std::lock_guard lock(global_mutex_); + + global_register_count_--; + if (global_register_count_ == 0) { + try { + Shutdown(); + } catch (const std::exception& ex) { + // Log error but don't throw from destructor + LOGS_DEFAULT(WARNING) << "Error during telemetry shutdown: " << ex.what(); + } + } +} + +void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventProperties&& props) const { + try { + logger_->LogEvent(std::move(props)); + } catch (const std::exception& ex) { + LOGS_DEFAULT(WARNING) << "[Telemetry] Failed to log event: " << ex.what(); + } +} + +void PosixTelemetry::Initialize() { + std::lock_guard lock(mutex_); + + // NOTE: On Android, the Java layer must be initialized before calling this: + // System.loadLibrary("maesdk"); + // new HttpClient(getApplicationContext()); + // OfflineRoom.connectContext(getApplicationContext()); // if using Room DB + // See cpp_client_telemetry/docs/cpp-start-android.md for details. + + // Create SDK configuration — stored as member because LogManagerImpl holds a reference + // and the configuration must remain valid for the lifetime of the log manager. + config_ = std::make_unique(); + auto& config = *config_; + + config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; + config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging + config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode + config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown + + // Configure cache for offline scenarios — use same directory as device ID storage + { +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) + constexpr bool is_mobile = true; +#else + constexpr bool is_mobile = false; +#endif + std::string cache_dir = DeviceId::GetStorageDirectory(is_mobile); + if (!cache_dir.empty()) { + std::string cache_path = cache_dir + "/telemetry_cache.db"; + config[CFG_STR_CACHE_FILE_PATH] = cache_path; + } + } + + // Configure RAM queue for async batching + config[CFG_INT_RAM_QUEUE_SIZE] = 512 * 1024; // 512KB RAM queue + + // Create log manager via LogManagerProvider (recommended for production use, + // per LogManager_Creation_and_Lifecycle_Management.md). + status_t status; + log_manager_ = LogManagerProvider::CreateLogManager(*config_, status); + if (status != STATUS_SUCCESS || !log_manager_) { + LOGS_DEFAULT(WARNING) << "Failed to create telemetry LogManager, status: " << status; + config_.reset(); + return; + } + + // Get logger for our tenant + logger_ = log_manager_->GetLogger(TENANT_TOKEN); + if (!logger_) { + LOGS_DEFAULT(WARNING) << "Failed to get telemetry logger"; + LogManagerProvider::Release(*config_); + log_manager_ = nullptr; + config_.reset(); + return; + } + + // Use BEST_EFFORT transmit profile to minimize battery and network impact. + // Events are batched and uploaded at a lower cadence. + log_manager_->SetTransmitProfile(TransmitProfile_BestEffort); + + // Override device ID with hashed version for privacy. + // The "c:" prefix tells the backend it's a caller-supplied identifier. + auto& ctx = log_manager_->GetSemanticContext(); + std::string raw_device_id; +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) + // Mobile: read SDK's auto-generated platform device ID (e.g., identifierForVendor + // on iOS, ANDROID_ID on Android) and hash it before sending. + auto* provider = static_cast(&ctx); + auto& fields = provider->GetCommonFields(); + auto it = fields.find(COMMONFIELDS_DEVICE_ID); + if (it != fields.end()) { + raw_device_id = it->second.to_string(); + } +#else + // Desktop: use our custom persistent UUID. + raw_device_id = DeviceId::Instance().GetValue(); +#endif + if (!raw_device_id.empty()) { + ctx.SetDeviceId("c:" + HashDeviceId(raw_device_id)); + } + + // Set application information as logger context (attached to all events) + logger_->SetContext("AppName", "ONNXRuntime"); + logger_->SetContext("AppVersion", ORT_VERSION); + logger_->SetContext("Platform", GetPlatformInfo()); + + enabled_ = true; +} + +void PosixTelemetry::Shutdown() { + std::lock_guard lock(mutex_); + + // Disable logging first to prevent new events during shutdown + enabled_ = false; + logger_ = nullptr; // Owned by log_manager_, will be destroyed with it + + if (log_manager_ && config_) { + // Per SDK use-after-free docs (use-after-free.md): + // Flush() must be called before FlushAndTeardown() to ensure all pending + // events are persisted to offline storage. FlushAndTeardown() internally + // calls PauseActivity() + WaitPause() to quiesce the SDK. + log_manager_->Flush(); + log_manager_->FlushAndTeardown(); + + // Release the log manager instance via LogManagerProvider + LogManagerProvider::Release(*config_); + log_manager_ = nullptr; + config_.reset(); + } +} + +std::string PosixTelemetry::GetPlatformInfo() const { +#if defined(__APPLE__) +#if TARGET_OS_IOS + return "iOS"; +#elif TARGET_OS_MAC + return "macOS"; +#else + return "Apple"; +#endif +#elif defined(__ANDROID__) + return "Android"; +#elif defined(__linux__) + return "Linux"; +#else + return "Unknown"; +#endif +} + +// --------------------------------------------------------------------------- +// Process / system info helpers for LogProcessInfo +// --------------------------------------------------------------------------- + +// Get detailed OS version string (e.g., "macOS 15.2", "Ubuntu 22.04 LTS") +std::string PosixTelemetry::GetOsDescription() const { +#if defined(__APPLE__) + char version[64] = {}; + size_t len = sizeof(version); + if (sysctlbyname("kern.osproductversion", version, &len, nullptr, 0) == 0) { +#if TARGET_OS_IOS + return std::string("iOS ") + version; +#else + return std::string("macOS ") + version; +#endif + } + return GetPlatformInfo(); + +#elif defined(__ANDROID__) + // Read Android system properties via /system/build.prop + std::string release, sdk; + std::ifstream prop("/system/build.prop"); + if (prop.is_open()) { + std::string line; + while (std::getline(prop, line)) { + if (line.rfind("ro.build.version.release=", 0) == 0) + release = line.substr(25); + else if (line.rfind("ro.build.version.sdk=", 0) == 0) + sdk = line.substr(21); + } + } + if (!release.empty()) { + std::string result = "Android " + release; + if (!sdk.empty()) result += " (API " + sdk + ")"; + return result; + } + return "Android"; + +#elif defined(__linux__) + // Parse /etc/os-release for PRETTY_NAME (e.g., "Ubuntu 22.04.3 LTS") + std::ifstream os_release("/etc/os-release"); + if (os_release.is_open()) { + std::string line; + while (std::getline(os_release, line)) { + if (line.rfind("PRETTY_NAME=", 0) == 0) { + std::string value = line.substr(12); + if (value.size() >= 2 && value.front() == '"' && value.back() == '"') { + value = value.substr(1, value.size() - 2); + } + return value; + } + } + } + return "Linux"; + +#else + return "Unknown"; +#endif +} + +// Get the name of the current process +std::string PosixTelemetry::GetProcessName() const { +#if defined(__APPLE__) || defined(__FreeBSD__) + const char* name = getprogname(); + return name ? name : ""; + +#elif defined(__linux__) || defined(__ANDROID__) + // /proc/self/comm contains the process name (up to 15 chars) + std::ifstream comm("/proc/self/comm"); + if (comm.is_open()) { + std::string name; + std::getline(comm, name); + while (!name.empty() && (name.back() == '\n' || name.back() == '\r')) + name.pop_back(); + return name; + } + return ""; + +#else + return ""; +#endif +} + +// Get the CPU architecture the binary was compiled for +std::string PosixTelemetry::GetArchitecture() { +#if defined(__x86_64__) + return "x86_64"; +#elif defined(__i386__) + return "x86"; +#elif defined(__aarch64__) + return "arm64"; +#elif defined(__arm__) + return "arm"; +#elif defined(__riscv) + return "riscv"; +#elif defined(__wasm__) + return "wasm"; +#else + return "unknown"; +#endif +} + +// Get total physical memory in MB +int64_t PosixTelemetry::GetTotalMemoryMB() { +#if defined(__APPLE__) + int64_t mem = 0; + size_t len = sizeof(mem); + if (sysctlbyname("hw.memsize", &mem, &len, nullptr, 0) == 0) { + return mem / (1024 * 1024); + } + return -1; + +#elif defined(__linux__) || defined(__ANDROID__) + long pages = sysconf(_SC_PHYS_PAGES); + long page_size = sysconf(_SC_PAGE_SIZE); + if (pages > 0 && page_size > 0) { + return static_cast(pages) * page_size / (1024 * 1024); + } + return -1; + +#else + return -1; +#endif +} + +// Get system locale (e.g., "en-US", "ja-JP") +std::string PosixTelemetry::GetLocale() { + const char* lang = std::getenv("LANG"); + if (lang && lang[0]) { + std::string loc(lang); + // Strip encoding suffix (e.g., "en_US.UTF-8" → "en_US") + auto dot = loc.find('.'); + if (dot != std::string::npos) loc = loc.substr(0, dot); + // Normalize separator: "en_US" → "en-US" + for (auto& c : loc) { + if (c == '_') c = '-'; + } + return loc; + } + return ""; +} + +void PosixTelemetry::EnableTelemetryEvents() const { + enabled_ = true; +} + +void PosixTelemetry::DisableTelemetryEvents() const { + enabled_ = false; +} + +void PosixTelemetry::SetLanguageProjection(uint32_t projection) const { + projection_ = projection; +} + +bool PosixTelemetry::IsEnabled() const { + return enabled_; +} + +unsigned char PosixTelemetry::Level() const { + return level_; +} + +uint64_t PosixTelemetry::Keyword() const { + return keyword_; +} + +void PosixTelemetry::LogProcessInfo() const { + // LogProcessInfo only collects system metadata and always fires if we have a valid logger. + if (!logger_) { + return; + } + + // Log process info only once + if (process_info_logged_.exchange(true)) { + return; + } + + auto builder = EventBuilder("ProcessInfo", EventPriority::CRITICAL, + PDT_DeviceConnectivityAndConfiguration | PDT_SoftwareSetupAndInventory) + .AddCommonContext(this) + .AddString("runtimeVersion", ORT_VERSION) +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) + .AddString("DeviceInfo.Status", "Mobile") +#else + .AddString("DeviceInfo.Status", DeviceId::Instance().GetStatusString()) +#endif + .AddString("osDescription", GetOsDescription()) + .AddString("processName", GetProcessName()) + .AddString("architecture", GetArchitecture()) + .AddInt32("cpuCount", static_cast(std::thread::hardware_concurrency())) + .AddInt64("totalMemoryMB", GetTotalMemoryMB()) + .AddString("locale", GetLocale()); + + LogEventAsync(builder.Build()); +} + +void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("SessionCreationStart", EventPriority::CRITICAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogEvaluationStop(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("EvaluationStop", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + + LogEventAsync(std::move(event)); + + // Capture system metrics after each inference run to observe impact + LogSystemMetrics(session_id); +} + +void PosixTelemetry::LogEvaluationStart(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("EvaluationStart", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogSessionCreation( + uint32_t session_id, int64_t ir_version, + const std::string& model_producer_name, + const std::string& model_producer_version, + const std::string& model_domain, + const std::unordered_map& domain_to_version_map, + const std::string& model_file_name, + const std::string& model_graph_name, + const std::string& model_weight_type, + const std::string& model_graph_hash, + const std::string& model_weight_hash, + const std::unordered_map& model_metadata, + const std::string& loadedFrom, + const std::vector& execution_provider_ids, + bool use_fp16, bool captureState) const { + if (!enabled_ || !logger_) { + return; + } + + // captureState is currently only triggered on Windows via ETW's EVENT_CONTROL_CODE_CAPTURE_STATE callback + // (LogAllSessions). Kept here for future compatibility if a similar mechanism is added for POSIX. + std::string event_name = captureState ? "SessionCreation_CaptureState" : "SessionCreation"; + + auto builder = EventBuilder(std::move(event_name), EventPriority::CRITICAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt64("irVersion", ir_version) + .AddString("modelProducerName", model_producer_name) + .AddString("modelProducerVersion", model_producer_version) + .AddString("modelDomain", model_domain) + .AddIntMap("domainToVersionMap", domain_to_version_map) + .AddString("modelFileName", model_file_name) + .AddString("modelGraphName", model_graph_name) + .AddString("modelWeightType", model_weight_type) + .AddString("modelGraphHash", model_graph_hash) + .AddString("modelWeightHash", model_weight_hash) + .AddStringMap("modelMetadata", model_metadata) + .AddString("loadedFrom", loadedFrom) + .AddStringList("executionProviderIds", execution_provider_ids) + .AddBool("useFp16", use_fp16); + + LogEventAsync(builder.Build()); +} + +void PosixTelemetry::LogCompileModelStart( + uint32_t session_id, + const std::string& input_source, + const std::string& output_target, + uint32_t flags, + int graph_optimization_level, + bool embed_ep_context, + bool has_external_initializers_file, + const std::vector& execution_provider_ids) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("CompileModelStart", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddString("inputSource", input_source) + .AddString("outputTarget", output_target) + .AddUInt32("flags", flags) + .AddInt32("graphOptimizationLevel", graph_optimization_level) + .AddBool("embedEpContext", embed_ep_context) + .AddBool("hasExternalInitializersFile", has_external_initializers_file) + .AddStringList("executionProviderIds", execution_provider_ids) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogCompileModelComplete( + uint32_t session_id, + bool success, + uint32_t error_code, + uint32_t error_category, + const std::string& error_message) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("CompileModelComplete", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddBool("success", success) + .AddUInt32("errorCode", error_code) + .AddUInt32("errorCategory", error_category) + .AddString("errorMessage", error_message) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogRuntimeError( + uint32_t session_id, const common::Status& status, + const char* file, const char* function, uint32_t line) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("RuntimeError", EventPriority::HIGH, + PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt32("errorCode", static_cast(status.Code())) + .AddInt32("errorCategory", static_cast(status.Category())) + .AddString("errorMessage", status.ErrorMessage()) + .AddString("file", file ? file : "") + .AddString("function", function ? function : "") + .AddUInt32("line", line) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogRuntimePerf( + uint32_t session_id, uint32_t total_runs_since_last, + int64_t total_run_duration_since_last, + std::unordered_map duration_per_batch_size) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("RuntimePerf", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddUInt32("totalRunsSinceLast", total_runs_since_last) + .AddInt64("totalRunDurationSinceLast", total_run_duration_since_last) + .AddBatchSizeDurations(duration_per_batch_size) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogExecutionProviderEvent(LUID* adapterLuid) const { + // Not applicable for non-Windows platforms (LUID is Windows-specific) + (void)adapterLuid; +} + +void PosixTelemetry::LogDriverInfoEvent( + const std::string_view device_class, + const std::wstring_view& driver_names, + const std::wstring_view& driver_versions) const { + // Not applicable for non-Windows platforms + (void)device_class; + (void)driver_names; + (void)driver_versions; +} + +void PosixTelemetry::LogAutoEpSelection( + uint32_t session_id, const std::string& selection_policy, + const std::vector& requested_execution_provider_ids, + const std::vector& available_execution_provider_ids) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("EpAutoSelection", EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddString("selectionPolicy", selection_policy) + .AddStringList("requestedExecutionProviderIds", requested_execution_provider_ids) + .AddStringList("availableExecutionProviderIds", available_execution_provider_ids) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogProviderOptions( + const std::string& provider_id, + const std::string& provider_options_string, + bool captureState) const { + if (!enabled_ || !logger_) { + return; + } + + std::string event_name = captureState ? "ProviderOptions_CaptureState" : "ProviderOptions"; + + auto event = EventBuilder(std::move(event_name), EventPriority::NORMAL, + PDT_SoftwareSetupAndInventory) + .AddCommonContext(this) + .AddString("providerId", provider_id) + .AddString("providerOptions", provider_options_string) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogSystemMetrics(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + struct rusage usage; + if (getrusage(RUSAGE_SELF, &usage) == 0) { + // ru_maxrss is in KB on Linux, bytes on macOS +#ifdef __APPLE__ + int64_t max_rss_kb = usage.ru_maxrss / 1024; +#else + int64_t max_rss_kb = usage.ru_maxrss; +#endif + + auto event = EventBuilder("SystemMetrics", EventPriority::NORMAL, + PDT_ProductAndServicePerformance | PDT_DeviceConnectivityAndConfiguration) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt64("maxRssKb", max_rss_kb) + .AddInt64("userCpuTimeSec", usage.ru_utime.tv_sec) + .AddInt64("userCpuTimeUsec", usage.ru_utime.tv_usec) + .AddInt64("systemCpuTimeSec", usage.ru_stime.tv_sec) + .AddInt64("systemCpuTimeUsec", usage.ru_stime.tv_usec) + .AddInt64("minorPageFaults", usage.ru_minflt) + .AddInt64("majorPageFaults", usage.ru_majflt) + .AddInt64("voluntaryContextSwitches", usage.ru_nvcsw) + .AddInt64("involuntaryContextSwitches", usage.ru_nivcsw) + .Build(); + + LogEventAsync(std::move(event)); + } +} + +} // namespace onnxruntime diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h new file mode 100644 index 0000000000000..dde07ca0d53fc --- /dev/null +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -0,0 +1,152 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#pragma once + +#include "core/platform/telemetry.h" +#include +#include +#include +#include +#include + +// Forward declarations of 1DS SDK types +namespace Microsoft::Applications::Events { +class ILogger; +class ILogManager; +class ILogConfiguration; +class EventProperties; +} // namespace Microsoft::Applications::Events + +namespace onnxruntime { + +/** + * @brief Cross-platform telemetry implementation using 1DS SDK (cpp_client_telemetry). + * + * This class provides telemetry logging capabilities for all platforms + * using the cpp_client_telemetry library (1DS SDK). It implements the same interface + * as the original WindowsTelemetry to provide consistent telemetry across all platforms. + * + * Configuration: + * - Telemetry is opt-in via build flags + */ +class PosixTelemetry : public Telemetry { + public: + PosixTelemetry(); + ~PosixTelemetry() override; + + void EnableTelemetryEvents() const override; + void DisableTelemetryEvents() const override; + void SetLanguageProjection(uint32_t projection) const override; + + bool IsEnabled() const override; + unsigned char Level() const override; + uint64_t Keyword() const override; + + void LogProcessInfo() const override; + void LogSessionCreationStart(uint32_t session_id) const override; + void LogEvaluationStop(uint32_t session_id) const override; + void LogEvaluationStart(uint32_t session_id) const override; + + void LogSessionCreation(uint32_t session_id, int64_t ir_version, + const std::string& model_producer_name, + const std::string& model_producer_version, + const std::string& model_domain, + const std::unordered_map& domain_to_version_map, + const std::string& model_file_name, + const std::string& model_graph_name, + const std::string& model_weight_type, + const std::string& model_graph_hash, + const std::string& model_weight_hash, + const std::unordered_map& model_metadata, + const std::string& loadedFrom, + const std::vector& execution_provider_ids, + bool use_fp16, bool captureState) const override; + + void LogCompileModelStart(uint32_t session_id, + const std::string& input_source, + const std::string& output_target, + uint32_t flags, + int graph_optimization_level, + bool embed_ep_context, + bool has_external_initializers_file, + const std::vector& execution_provider_ids) const override; + + void LogCompileModelComplete(uint32_t session_id, + bool success, + uint32_t error_code, + uint32_t error_category, + const std::string& error_message) const override; + + void LogRuntimeError(uint32_t session_id, const common::Status& status, + const char* file, const char* function, uint32_t line) const override; + + void LogRuntimePerf(uint32_t session_id, uint32_t total_runs_since_last, + int64_t total_run_duration_since_last, + std::unordered_map duration_per_batch_size) const override; + + void LogExecutionProviderEvent(LUID* adapterLuid) const override; + void LogDriverInfoEvent(const std::string_view device_class, + const std::wstring_view& driver_names, + const std::wstring_view& driver_versions) const override; + + void LogAutoEpSelection(uint32_t session_id, const std::string& selection_policy, + const std::vector& requested_execution_provider_ids, + const std::vector& available_execution_provider_ids) const override; + + void LogProviderOptions(const std::string& provider_id, + const std::string& provider_options_string, + bool captureState) const override; + + private: + // Initialize telemetry SDK logger + void Initialize(); + + // Shutdown telemetry SDK logger + void Shutdown(); + + // Helper to get platform name + std::string GetPlatformInfo() const; + + // Process/system info helpers for LogProcessInfo + std::string GetOsDescription() const; + std::string GetProcessName() const; + static std::string GetArchitecture(); + static int64_t GetTotalMemoryMB(); + static std::string GetLocale(); + + // Safe async event logging. + void LogEventAsync(::Microsoft::Applications::Events::EventProperties&& props) const; + + // Log system resource metrics + void LogSystemMetrics(uint32_t session_id) const; + + // Mutex for thread-safe access + mutable std::mutex mutex_; + + // Telemetry SDK instances. + // log_manager_ is owned by LogManagerProvider; logger_ is owned by log_manager_. + ::Microsoft::Applications::Events::ILogManager* log_manager_ = nullptr; + ::Microsoft::Applications::Events::ILogger* logger_ = nullptr; + + // SDK configuration — must outlive log_manager_ (LogManagerImpl holds a reference). + std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> config_; + + // State tracking + mutable std::atomic enabled_{true}; + mutable std::atomic projection_{0}; + mutable std::atomic level_{0}; + mutable std::atomic keyword_{0}; + + // Process info tracking + mutable std::atomic process_info_logged_{false}; + + // Global registration count for singleton behavior + static std::atomic global_register_count_; + static std::mutex global_mutex_; + + // Make EventBuilder a friend so it can access projection_ + friend class EventBuilder; +}; + +} // namespace onnxruntime diff --git a/onnxruntime/core/platform/windows/telemetry.cc b/onnxruntime/core/platform/windows/telemetry.cc index 342b937ffb656..e31867f18854f 100644 --- a/onnxruntime/core/platform/windows/telemetry.cc +++ b/onnxruntime/core/platform/windows/telemetry.cc @@ -2,11 +2,12 @@ // Licensed under the MIT License. #include "core/platform/windows/telemetry.h" +#include +#include +#include #include #include #include -#include -#include #include "core/common/logging/logging.h" #include "onnxruntime_config.h" @@ -75,14 +76,45 @@ std::string ConvertWideStringToUtf8(const std::wstring& wide) { return utf8; } +// Parse the command line for -s (service name) and -k (service group) arguments. +// These are svchost.exe conventions and may not be present for all services. +std::string GetServiceNamesFromCommandLine() { + LPCWSTR cmd_line = ::GetCommandLineW(); + if (cmd_line == nullptr) + return {}; + + int argc = 0; + LPWSTR* argv = ::CommandLineToArgvW(cmd_line, &argc); + if (argv == nullptr) + return {}; + + std::wstring aggregated; + bool first = true; + for (int i = 0; i < argc - 1; ++i) { + if ((_wcsicmp(argv[i], L"-s") == 0 || _wcsicmp(argv[i], L"-k") == 0)) { + if (!first) { + aggregated.push_back(L','); + } + aggregated.append(argv[i + 1]); + first = false; + ++i; // skip the value we just consumed + } + } + + ::LocalFree(argv); + return ConvertWideStringToUtf8(aggregated); +} + std::string GetServiceNamesForCurrentProcess() { static std::once_flag once_flag; static std::string service_names; std::call_once(once_flag, [] { SC_HANDLE service_manager = ::OpenSCManagerW(nullptr, nullptr, SC_MANAGER_ENUMERATE_SERVICE); - if (service_manager == nullptr) + if (service_manager == nullptr) { + service_names = GetServiceNamesFromCommandLine(); return; + } DWORD bytes_needed = 0; DWORD services_returned = 0; @@ -91,11 +123,13 @@ std::string GetServiceNamesForCurrentProcess() { &services_returned, &resume_handle, nullptr) && ::GetLastError() != ERROR_MORE_DATA) { ::CloseServiceHandle(service_manager); + service_names = GetServiceNamesFromCommandLine(); return; } if (bytes_needed == 0) { ::CloseServiceHandle(service_manager); + service_names = GetServiceNamesFromCommandLine(); return; } @@ -106,6 +140,7 @@ std::string GetServiceNamesForCurrentProcess() { if (!::EnumServicesStatusExW(service_manager, SC_ENUM_PROCESS_INFO, SERVICE_WIN32, SERVICE_ACTIVE, reinterpret_cast(services), bytes_needed, &bytes_needed, &services_returned, &resume_handle, nullptr)) { ::CloseServiceHandle(service_manager); + service_names = GetServiceNamesFromCommandLine(); return; } @@ -125,6 +160,9 @@ std::string GetServiceNamesForCurrentProcess() { ::CloseServiceHandle(service_manager); service_names = ConvertWideStringToUtf8(aggregated); + if (service_names.empty()) { + service_names = GetServiceNamesFromCommandLine(); + } }); return service_names; diff --git a/tools/ci_build/build.py b/tools/ci_build/build.py index 231888f2204a8..79747853ab201 100644 --- a/tools/ci_build/build.py +++ b/tools/ci_build/build.py @@ -354,12 +354,16 @@ def generate_build_tree( disable_float4_types = args.android or ("float4" in types_to_disable) disable_optional_type = "optional" in types_to_disable disable_sparse_tensors = "sparsetensor" in types_to_disable + # Telemetry: On Windows uses ETW, on non-Windows uses 1DS + cmake_args += [ + "-Donnxruntime_USE_TELEMETRY=" + ("ON" if args.use_telemetry else "OFF"), + ] + disable_string_type = "string" in types_to_disable if is_windows(): cmake_args += [ "-Donnxruntime_USE_DML=" + ("ON" if args.use_dml else "OFF"), "-Donnxruntime_USE_WINML=" + ("ON" if args.use_winml else "OFF"), - "-Donnxruntime_USE_TELEMETRY=" + ("ON" if args.use_telemetry else "OFF"), "-Donnxruntime_ENABLE_PIX_FOR_WEBGPU_EP=" + ("ON" if args.enable_pix_capture else "OFF"), ] diff --git a/tools/ci_build/build_args.py b/tools/ci_build/build_args.py index abc9a8d91779c..9241d253facd1 100644 --- a/tools/ci_build/build_args.py +++ b/tools/ci_build/build_args.py @@ -435,7 +435,6 @@ def add_windows_specific_args(parser: argparse.ArgumentParser) -> None: parser.add_argument("--msvc_toolset", help="MSVC toolset version (e.g., 14.11). Must be >=14.40") parser.add_argument("--windows_sdk_version", help="Windows SDK version (e.g., 10.0.19041.0).") parser.add_argument("--enable_msvc_static_runtime", action="store_true", help="Statically link MSVC runtimes.") - parser.add_argument("--use_telemetry", action="store_true", help="Enable telemetry (official builds only).") parser.add_argument("--caller_framework", type=str, help="Name of the framework calling ONNX Runtime.") # Cross-compilation targets hosted on Windows @@ -869,6 +868,10 @@ def add_other_feature_args(parser: argparse.ArgumentParser) -> None: action="store_true", help="Build ORT shared lib with compatible bridge for primary EPs (TRT, OV, QNN, VitisAI), excludes tests.", ) + # Telemetry arguments (cross-platform) + parser.add_argument( + "--use_telemetry", action="store_true", help="Enable telemetry (ETW on Windows, 1DS on other platforms)." + ) def is_cross_compiling(args: argparse.Namespace) -> bool: From d86ee5d886cac561d629452551983d5c43dd9960 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 10 Jun 2026 23:43:26 -0500 Subject: [PATCH 09/61] Consume 1DS telemetry SDK from the cpp-client-telemetry vcpkg port When building with vcpkg (onnxruntime_USE_VCPKG), obtain the 1DS SDK from the cpp-client-telemetry port via find_package(MSTelemetry CONFIG REQUIRED) and link MSTelemetry::mat, whose imported target already carries include directories and transitive deps (curl/sqlite3/zlib/nlohmann-json). The existing FetchContent path with its build workarounds is retained as the non-vcpkg fallback. Adds a 'telemetry' vcpkg manifest feature (gated to non-Windows) and passes --x-feature=telemetry from build.py when --use_telemetry is set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/external/onnxruntime_external_deps.cmake | 8 ++++++++ cmake/onnxruntime_1ds_telemetry.cmake | 11 +++++++---- cmake/onnxruntime_common.cmake | 9 +++++++-- cmake/vcpkg.json | 9 +++++++++ tools/ci_build/build.py | 2 ++ 5 files changed, 33 insertions(+), 6 deletions(-) diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index 9a91f4a9ab145..45fb6d375f5ca 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -903,6 +903,13 @@ endif() # 1DS SDK (cpp_client_telemetry) for cross-platform telemetry on non-Windows platforms if(onnxruntime_USE_TELEMETRY AND NOT WIN32) + if(onnxruntime_USE_VCPKG) + # Consume the 1DS SDK from the vcpkg port "cpp-client-telemetry", which exposes the + # MSTelemetry::mat target with its include directories and transitive dependencies + # (curl/nlohmann-json/sqlite3/zlib) already wired up via vcpkg. None of the FetchContent + # workarounds below are needed on this path. + find_package(MSTelemetry CONFIG REQUIRED) + else() set(BUILD_UNIT_TESTS_SAVED "${BUILD_UNIT_TESTS}") set(BUILD_FUNC_TESTS_SAVED "${BUILD_FUNC_TESTS}") set(BUILD_SAMPLES_SAVED "${BUILD_SAMPLES}") @@ -964,6 +971,7 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) set(BUILD_UNIT_TESTS "${BUILD_UNIT_TESTS_SAVED}" CACHE BOOL "" FORCE) set(BUILD_FUNC_TESTS "${BUILD_FUNC_TESTS_SAVED}" CACHE BOOL "" FORCE) set(BUILD_SAMPLES "${BUILD_SAMPLES_SAVED}" CACHE BOOL "" FORCE) + endif() endif() FILE(TO_NATIVE_PATH ${CMAKE_BINARY_DIR} ORT_BINARY_DIR) diff --git a/cmake/onnxruntime_1ds_telemetry.cmake b/cmake/onnxruntime_1ds_telemetry.cmake index 20a73f2139b09..2d3482aa8834d 100644 --- a/cmake/onnxruntime_1ds_telemetry.cmake +++ b/cmake/onnxruntime_1ds_telemetry.cmake @@ -3,12 +3,15 @@ # This file handles telemetry integration for non-Windows platforms # (macOS, Linux, Android, iOS) using the 1DS SDK (cpp_client_telemetry). -# The SDK is fetched via FetchContent in onnxruntime_external_deps.cmake. +# The SDK is provided either by the vcpkg port "cpp-client-telemetry" (target +# MSTelemetry::mat) or fetched via FetchContent (target mat) in +# onnxruntime_external_deps.cmake. if(onnxruntime_USE_TELEMETRY AND NOT WIN32) - if(NOT TARGET mat) - message(FATAL_ERROR "Telemetry enabled for non-Windows but 'mat' target not found. " - "Ensure cpp_client_telemetry is fetched in onnxruntime_external_deps.cmake.") + if(NOT TARGET mat AND NOT TARGET MSTelemetry::mat) + message(FATAL_ERROR "Telemetry enabled for non-Windows but no 1DS SDK target " + "('mat' or 'MSTelemetry::mat') was found. Ensure cpp_client_telemetry " + "is provided via the vcpkg port or fetched in onnxruntime_external_deps.cmake.") endif() message(STATUS "Enabling 1DS telemetry for non-Windows platforms") diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index 3e55a56a74284..eeba5d5de7aa9 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -217,7 +217,12 @@ endif() # Link telemetry library (1DS SDK) for non-Windows platforms if(onnxruntime_USE_TELEMETRY AND NOT WIN32) - if(TARGET mat) + if(TARGET MSTelemetry::mat) + # vcpkg port (cpp-client-telemetry): the imported target propagates its include + # directories and transitive dependencies (curl/sqlite3/zlib/nlohmann-json), so no + # manual include paths or system libraries are required here. + target_link_libraries(onnxruntime_common PRIVATE MSTelemetry::mat) + elseif(TARGET mat) target_link_libraries(onnxruntime_common PRIVATE mat) # cpp_client_telemetry uses include_directories() (directory-scoped) rather than # target_include_directories(), so include paths don't propagate via target_link_libraries. @@ -247,7 +252,7 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) ) endif() else() - message(WARNING "Telemetry enabled but 'mat' library target not found") + message(WARNING "Telemetry enabled but no 1DS SDK target ('MSTelemetry::mat' or 'mat') was found") endif() endif() diff --git a/cmake/vcpkg.json b/cmake/vcpkg.json index e9ea52e4b5248..9cf0185f35d77 100644 --- a/cmake/vcpkg.json +++ b/cmake/vcpkg.json @@ -96,6 +96,15 @@ "webgpu-ep": { "description": "Build with WebGPU EP", "dependencies": [] + }, + "telemetry": { + "description": "Build with 1DS telemetry support (cpp-client-telemetry) on non-Windows platforms", + "dependencies": [ + { + "name": "cpp-client-telemetry", + "platform": "!windows & !emscripten & !uwp" + } + ] } }, "overrides": [ diff --git a/tools/ci_build/build.py b/tools/ci_build/build.py index 79747853ab201..6746250d920e4 100644 --- a/tools/ci_build/build.py +++ b/tools/ci_build/build.py @@ -267,6 +267,8 @@ def generate_vcpkg_install_options(build_dir, args): vcpkg_install_options.append("--x-feature=webnn-ep") if args.use_xnnpack: vcpkg_install_options.append("--x-feature=xnnpack-ep") + if args.use_telemetry: + vcpkg_install_options.append("--x-feature=telemetry") overlay_triplets_dir = None From b6f10cd430b5dfa0c1d85d91c0aa453fa00ac8f3 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 10 Jun 2026 23:51:27 -0500 Subject: [PATCH 10/61] Fix POSIX telemetry to match the current Telemetry interface Rebasing onto current main surfaced interface drift and a latent build break: - LogSessionCreation: add hardware_device_types/hardware_vendor_ids params (the base interface gained them at schemaVersion 1) and emit them as hardwareDeviceTypes/hardwareVendorIds to match the Windows ETW schema; the missing params made the override fail to compile. - LogRuntimePerf: take duration_per_batch_size by const reference (the base signature changed from by-value), which otherwise broke the override. - env.cc: restore an '#else Telemetry telemetry_provider_' fallback so PosixEnv::GetTelemetryProvider() still has a member to return when USE_1DS_TELEMETRY is not defined (the default, telemetry-off build was otherwise uncompilable). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/env.cc | 2 ++ onnxruntime/core/platform/posix/telemetry.cc | 6 +++++- onnxruntime/core/platform/posix/telemetry.h | 4 +++- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/onnxruntime/core/platform/posix/env.cc b/onnxruntime/core/platform/posix/env.cc index 96964b8beeb5f..557e56863c877 100644 --- a/onnxruntime/core/platform/posix/env.cc +++ b/onnxruntime/core/platform/posix/env.cc @@ -649,6 +649,8 @@ class PosixEnv : public Env { private: #ifdef USE_1DS_TELEMETRY PosixTelemetry telemetry_provider_; +#else + Telemetry telemetry_provider_; #endif #ifdef ORT_USE_CPUINFO PosixEnv() { diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 6eb5dfb11ddef..3084bf3f7a1fc 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -594,6 +594,8 @@ void PosixTelemetry::LogSessionCreation( const std::unordered_map& model_metadata, const std::string& loadedFrom, const std::vector& execution_provider_ids, + const std::string& hardware_device_types, + const std::string& hardware_vendor_ids, bool use_fp16, bool captureState) const { if (!enabled_ || !logger_) { return; @@ -620,6 +622,8 @@ void PosixTelemetry::LogSessionCreation( .AddStringMap("modelMetadata", model_metadata) .AddString("loadedFrom", loadedFrom) .AddStringList("executionProviderIds", execution_provider_ids) + .AddString("hardwareDeviceTypes", hardware_device_types) + .AddString("hardwareVendorIds", hardware_vendor_ids) .AddBool("useFp16", use_fp16); LogEventAsync(builder.Build()); @@ -702,7 +706,7 @@ void PosixTelemetry::LogRuntimeError( void PosixTelemetry::LogRuntimePerf( uint32_t session_id, uint32_t total_runs_since_last, int64_t total_run_duration_since_last, - std::unordered_map duration_per_batch_size) const { + const std::unordered_map& duration_per_batch_size) const { if (!enabled_ || !logger_) { return; } diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index dde07ca0d53fc..975c6b61eeadc 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -61,6 +61,8 @@ class PosixTelemetry : public Telemetry { const std::unordered_map& model_metadata, const std::string& loadedFrom, const std::vector& execution_provider_ids, + const std::string& hardware_device_types, + const std::string& hardware_vendor_ids, bool use_fp16, bool captureState) const override; void LogCompileModelStart(uint32_t session_id, @@ -83,7 +85,7 @@ class PosixTelemetry : public Telemetry { void LogRuntimePerf(uint32_t session_id, uint32_t total_runs_since_last, int64_t total_run_duration_since_last, - std::unordered_map duration_per_batch_size) const override; + const std::unordered_map& duration_per_batch_size) const override; void LogExecutionProviderEvent(LUID* adapterLuid) const override; void LogDriverInfoEvent(const std::string_view device_class, From 631fe4ca55070e6d84610df064d42e89003a39ad Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 10 Jun 2026 23:56:29 -0500 Subject: [PATCH 11/61] Honor telemetry opt-out in POSIX LogProcessInfo LogProcessInfo guarded only on the logger handle, so a runtime DisableTelemetryEvents() call still emitted the ProcessInfo event. Check enabled_ as every other event does (and as WindowsTelemetry::LogProcessInfo does), so the opt-out is respected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 3084bf3f7a1fc..4c9af09fbefe3 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -506,8 +506,9 @@ uint64_t PosixTelemetry::Keyword() const { } void PosixTelemetry::LogProcessInfo() const { - // LogProcessInfo only collects system metadata and always fires if we have a valid logger. - if (!logger_) { + // LogProcessInfo only collects system metadata, but it must still honor the + // runtime opt-out (DisableTelemetryEvents) like every other event. + if (!enabled_ || !logger_) { return; } From 7d37b43966ec8bffec2bd0770c88f2f145883b83 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 11 Jun 2026 00:44:45 -0500 Subject: [PATCH 12/61] Sample POSIX SystemMetrics to bound per-inference overhead LogEvaluationStop runs on every inference Run() and emitted a SystemMetrics event (getrusage syscall + 11-property event) every time. Sample it: emit on the first run and then every kSystemMetricsSampleInterval (100) runs, skipping the syscall on non-sampled runs, so high-frequency small-model inference is not slowed by telemetry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 12 ++++++++++++ onnxruntime/core/platform/posix/telemetry.h | 3 +++ 2 files changed, 15 insertions(+) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 4c9af09fbefe3..28424032a2c8d 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -47,6 +47,11 @@ enum class EventPriority { CRITICAL = EventLatency_RealTime // ProcessInfo, SessionCreation }; +// SystemMetrics is emitted from LogEvaluationStop, i.e. once per inference Run(), which is a hot +// path for small/high-frequency models. Sample it to bound the per-run getrusage() + event cost: +// the event is emitted on the first run and then once every kSystemMetricsSampleInterval runs. +constexpr uint32_t kSystemMetricsSampleInterval = 100; + // Helper class to build events with common properties class EventBuilder { private: @@ -784,6 +789,13 @@ void PosixTelemetry::LogSystemMetrics(uint32_t session_id) const { return; } + // Sample to bound per-inference overhead: emit on the first run and every + // kSystemMetricsSampleInterval-th run thereafter. fetch_add returns the previous value, so the + // first call (0) passes and the getrusage() syscall below is skipped on non-sampled runs. + if ((system_metrics_sample_counter_.fetch_add(1, std::memory_order_relaxed) % kSystemMetricsSampleInterval) != 0) { + return; + } + struct rusage usage; if (getrusage(RUSAGE_SELF, &usage) == 0) { // ru_maxrss is in KB on Linux, bytes on macOS diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 975c6b61eeadc..17f60a3aa0c10 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -143,6 +143,9 @@ class PosixTelemetry : public Telemetry { // Process info tracking mutable std::atomic process_info_logged_{false}; + // Sampling counter for the per-run SystemMetrics event (see LogSystemMetrics). + mutable std::atomic system_metrics_sample_counter_{0}; + // Global registration count for singleton behavior static std::atomic global_register_count_; static std::mutex global_mutex_; From c59d454bd176c5281368d5567237645bec1747a2 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 11 Jun 2026 01:25:24 -0500 Subject: [PATCH 13/61] Implement remaining cross-platform telemetry events on POSIX PosixTelemetry inherited base no-ops for 7 events that are emitted from platform-agnostic core/session code, so POSIX silently dropped data Windows collects. Implement them, mirroring the Windows ETW event names and fields via EventBuilder (omitting the Windows-only frameworkName): ModelLoadStart, ModelLoadEnd, SessionCreationEnd, EpDeviceUsage, RegisterEpLibraryStart, RegisterEpLibraryEnd, RegisterEpLibraryWithLibPath. The genuinely Windows-only events (LogExecutionProviderEvent/LUID, LogDriverInfoEvent) remain no-op stubs. Verified on WSL/Linux: telemetry.cc compiles -fsyntax-only against 1DS SDK v3.10.40.1; all overrides match the base interface. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 132 +++++++++++++++++++ onnxruntime/core/platform/posix/telemetry.h | 22 ++++ 2 files changed, 154 insertions(+) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 28424032a2c8d..4142e4df4a848 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -784,6 +784,138 @@ void PosixTelemetry::LogProviderOptions( LogEventAsync(std::move(event)); } +void PosixTelemetry::LogModelLoadStart(uint32_t session_id) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("ModelLoadStart", EventPriority::NORMAL, + PDT_ProductAndServiceUsage) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogModelLoadEnd(uint32_t session_id, const common::Status& status) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("ModelLoadEnd", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddBool("isSuccess", status.IsOK()) + .AddInt32("errorCode", static_cast(status.Code())) + .AddInt32("errorCategory", static_cast(status.Category())) + .AddString("errorMessage", status.ErrorMessage()) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogSessionCreationEnd(uint32_t session_id, const common::Status& status) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("SessionCreationEnd", EventPriority::CRITICAL, + PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddBool("isSuccess", status.IsOK()) + .AddInt32("errorCode", static_cast(status.Code())) + .AddInt32("errorCategory", static_cast(status.Category())) + .AddString("errorMessage", status.ErrorMessage()) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogEpDeviceUsage( + uint32_t session_id, + const std::string& ep_type, + const std::string& hardware_device_type, + uint32_t hardware_vendor_id, + uint32_t hardware_device_id, + const std::string& hardware_vendor, + const std::string& ep_vendor, + int assigned_node_count, + uint32_t total_runs_since_last, + int64_t total_run_duration_since_last) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("EpDeviceUsage", EventPriority::NORMAL, + PDT_ProductAndServiceUsage) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddString("executionProviderType", ep_type) + .AddString("hardwareDeviceType", hardware_device_type) + .AddUInt32("hardwareVendorId", hardware_vendor_id) + .AddUInt32("hardwareDeviceId", hardware_device_id) + .AddString("hardwareVendor", hardware_vendor) + .AddString("epVendor", ep_vendor) + .AddInt32("assignedNodeCount", assigned_node_count) + .AddUInt32("totalRunsSinceLast", total_runs_since_last) + .AddInt64("totalRunDurationSinceLast", total_run_duration_since_last) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogRegisterEpLibraryStart(const std::string& registration_name) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("RegisterEpLibraryStart", EventPriority::NORMAL, + PDT_ProductAndServiceUsage) + .AddCommonContext(this) + .AddString("registrationName", registration_name) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogRegisterEpLibraryEnd(const std::string& registration_name, + const common::Status& status) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("RegisterEpLibraryEnd", EventPriority::NORMAL, + PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddString("registrationName", registration_name) + .AddBool("isSuccess", status.IsOK()) + .AddInt32("errorCode", static_cast(status.Code())) + .AddInt32("errorCategory", static_cast(status.Category())) + .AddString("errorMessage", status.ErrorMessage()) + .Build(); + + LogEventAsync(std::move(event)); +} + +void PosixTelemetry::LogRegisterEpLibraryWithLibPath(const std::string& registration_name, + const std::string& lib_path) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("RegisterEpLibraryWithLibPath", EventPriority::NORMAL, + PDT_ProductAndServiceUsage) + .AddCommonContext(this) + .AddString("registrationName", registration_name) + .AddString("libPath", lib_path) + .Build(); + + LogEventAsync(std::move(event)); +} + void PosixTelemetry::LogSystemMetrics(uint32_t session_id) const { if (!enabled_ || !logger_) { return; diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 17f60a3aa0c10..8d07fe7e57499 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -100,6 +100,28 @@ class PosixTelemetry : public Telemetry { const std::string& provider_options_string, bool captureState) const override; + void LogModelLoadStart(uint32_t session_id) const override; + void LogModelLoadEnd(uint32_t session_id, const common::Status& status) const override; + + void LogSessionCreationEnd(uint32_t session_id, const common::Status& status) const override; + + void LogEpDeviceUsage(uint32_t session_id, + const std::string& ep_type, + const std::string& hardware_device_type, + uint32_t hardware_vendor_id, + uint32_t hardware_device_id, + const std::string& hardware_vendor, + const std::string& ep_vendor, + int assigned_node_count, + uint32_t total_runs_since_last, + int64_t total_run_duration_since_last) const override; + + void LogRegisterEpLibraryStart(const std::string& registration_name) const override; + void LogRegisterEpLibraryEnd(const std::string& registration_name, + const common::Status& status) const override; + void LogRegisterEpLibraryWithLibPath(const std::string& registration_name, + const std::string& lib_path) const override; + private: // Initialize telemetry SDK logger void Initialize(); From fcbde261c62883b0e508b4cc80ed7f62a881ee98 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 11 Jun 2026 01:40:21 -0500 Subject: [PATCH 14/61] Make PosixTelemetry shared SDK state static (singleton, Windows parity) Initialization/teardown is gated on the static global_register_count_, but the SDK handles (log_manager_, logger_, config_) and state (enabled_, projection_, level_, keyword_, process_info_logged_, sampling counter) were per-instance. With more than one instance, only the first acquired a live logger and teardown could leak the manager or drop buffered events. Make all shared state static (and the mutex), mirroring WindowsTelemetry, so a single owner manages the SDK regardless of instance count. Removed 'mutable' (incompatible with static); added out-of-line definitions. Verified on WSL/Linux: telemetry.cc compiles -fsyntax-only against 1DS SDK v3.10.40.1 and override conformance holds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 10 ++++++++ onnxruntime/core/platform/posix/telemetry.h | 27 ++++++++++++-------- 2 files changed, 26 insertions(+), 11 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 4142e4df4a848..ce1d8dbdd770b 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -36,6 +36,16 @@ namespace onnxruntime { // Static member initialization std::atomic PosixTelemetry::global_register_count_{0}; std::mutex PosixTelemetry::global_mutex_; +std::mutex PosixTelemetry::mutex_; +::Microsoft::Applications::Events::ILogManager* PosixTelemetry::log_manager_ = nullptr; +::Microsoft::Applications::Events::ILogger* PosixTelemetry::logger_ = nullptr; +std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> PosixTelemetry::config_; +std::atomic PosixTelemetry::enabled_{true}; +std::atomic PosixTelemetry::projection_{0}; +std::atomic PosixTelemetry::level_{0}; +std::atomic PosixTelemetry::keyword_{0}; +std::atomic PosixTelemetry::process_info_logged_{false}; +std::atomic PosixTelemetry::system_metrics_sample_counter_{0}; // Tenant token for 1DS telemetry ingestion constexpr const char* TENANT_TOKEN = "5ad963bd4b3a4118a481401cc0211875-da8e8657-47d4-4ed7-ab39-7886e136f53b-6988"; diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 8d07fe7e57499..2a2aa5dc9104a 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -145,28 +145,33 @@ class PosixTelemetry : public Telemetry { // Log system resource metrics void LogSystemMetrics(uint32_t session_id) const; - // Mutex for thread-safe access - mutable std::mutex mutex_; + // All shared telemetry state below is static: PosixTelemetry is a process-wide singleton whose + // lifetime is gated by global_register_count_ (the first instance initializes the SDK, the last + // tears it down), matching WindowsTelemetry. Keeping the SDK handles and state static ensures a + // single owner regardless of how many PosixTelemetry objects exist. + + // Mutex for thread-safe init/shutdown of the shared SDK state. + static std::mutex mutex_; // Telemetry SDK instances. // log_manager_ is owned by LogManagerProvider; logger_ is owned by log_manager_. - ::Microsoft::Applications::Events::ILogManager* log_manager_ = nullptr; - ::Microsoft::Applications::Events::ILogger* logger_ = nullptr; + static ::Microsoft::Applications::Events::ILogManager* log_manager_; + static ::Microsoft::Applications::Events::ILogger* logger_; // SDK configuration — must outlive log_manager_ (LogManagerImpl holds a reference). - std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> config_; + static std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> config_; // State tracking - mutable std::atomic enabled_{true}; - mutable std::atomic projection_{0}; - mutable std::atomic level_{0}; - mutable std::atomic keyword_{0}; + static std::atomic enabled_; + static std::atomic projection_; + static std::atomic level_; + static std::atomic keyword_; // Process info tracking - mutable std::atomic process_info_logged_{false}; + static std::atomic process_info_logged_; // Sampling counter for the per-run SystemMetrics event (see LogSystemMetrics). - mutable std::atomic system_metrics_sample_counter_{0}; + static std::atomic system_metrics_sample_counter_; // Global registration count for singleton behavior static std::atomic global_register_count_; From b45674e4841c584513208c896fc547d0a84fb0b5 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 11 Jun 2026 01:49:12 -0500 Subject: [PATCH 15/61] Align FetchContent 1DS SDK fallback with the vcpkg port (3.10.40.1 -> 3.10.161.1) The vcpkg port pulls cpp_client_telemetry v3.10.161.1, but the non-vcpkg FetchContent fallback still pinned v3.10.40.1 (Feb 2026). Bump it to v3.10.161.1 so both paths build the same SDK and the fallback picks up ~4 months of fixes: vendored SQLite 3.34.1 -> 3.53.1 (multiple CVEs), vendored zlib 1.3.2, Apple reachability via NWPathMonitor (iOS 18 crash fix), libcurl poll() (removes 1024-FD limit), and -Wextra-semi/redefinition header fixes that matter under ORT's -Werror. Also fix the cgmanifest entry to record the actual tag commit (cc03dce...) instead of the archive SHA1. Archive SHA1 validated by reproducing the existing 3.10.40.1 hash; telemetry.cc verified to compile -fsyntax-only against v3.10.161.1 headers; bundled-dep layout (sqlite/, zlib/) and the 'mat' target our FetchContent workarounds rely on are unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cgmanifests/cgmanifest.json | 2 +- cmake/deps.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/cgmanifests/cgmanifest.json b/cgmanifests/cgmanifest.json index dfe6f0d4d1553..abed2921509c6 100644 --- a/cgmanifests/cgmanifest.json +++ b/cgmanifests/cgmanifest.json @@ -350,7 +350,7 @@ "component": { "type": "git", "git": { - "commitHash": "ee2ded25e539f64052c9d8635bef4ea62c30e014", + "commitHash": "cc03dce1a23538f6b820401e36ee339e9a5d2edd", "repositoryUrl": "https://github.com/microsoft/cpp_client_telemetry.git" }, "comments": "1DS SDK (cpp_client_telemetry) for cross-platform telemetry on non-Windows platforms (macOS, Linux, Android, iOS)." diff --git a/cmake/deps.txt b/cmake/deps.txt index 942ca9f1b2217..28700816d20d3 100644 --- a/cmake/deps.txt +++ b/cmake/deps.txt @@ -65,4 +65,4 @@ kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.20.0.tar. kleidiai-qmx;https://github.com/qualcomm/kleidiai/archive/2f10c9a8d32f81ffeeb6d4885a29cc35d2b0da87.zip;5e855730a2d69057a569f43dd7532db3b2d2a05c vulkan_headers;https://codeload.github.com/KhronosGroup/Vulkan-Headers/tar.gz/refs/tags/v1.4.344;57bc528ef7c4a3f7bfbb59e64a187e3734bd29d8 # cpp_client_telemetry (1DS SDK) for cross-platform telemetry on non-Windows platforms -cpp_client_telemetry;https://github.com/microsoft/cpp_client_telemetry/archive/refs/tags/v3.10.40.1.zip;ee2ded25e539f64052c9d8635bef4ea62c30e014 +cpp_client_telemetry;https://github.com/microsoft/cpp_client_telemetry/archive/refs/tags/v3.10.161.1.zip;0c0e767283fde29629bbc647b21bc0ac39edeb01 From 6e4304f94184cbd85d0ac43589a54220e243d563 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 16 Jun 2026 02:35:22 -0500 Subject: [PATCH 16/61] Bump vcpkg baseline to resolve cpp-client-telemetry; drop unused utf8-range; pin mimalloc ORT pinned vcpkg baseline 120deac (2025-08), which predates the cpp-client-telemetry port, so --use_vcpkg + --x-feature=telemetry could not resolve the dependency. Bump the baseline to 22b8d09994 (2026-06-12, the exact commit that added the port, vcpkg#52316). Impact was verified with a full-manifest 'vcpkg install --dry-run' (x64-linux, exit 0): cpp-client-telemetry@3.10.161.1 resolves; ORT's heavy deps stay pinned by overlay/override (eigen3 3.4.0, protobuf 3.21.12, onnx, abseil, pybind11, cpuinfo, dlpack); only benign security/minor bumps float (zlib 1.3.2, sqlite3 3.53.2, openssl 3.6.3, curl 8.20.0, ms-gsl 4.2.2, boost 1.91, benchmark 1.9.5). Two notable floats are neutralized: remove utf8-range (vestigial - ORT links no utf8_range target and pins protobuf 3.21.12 which predates the split; resolution succeeds without it), and override mimalloc to 2.2.3 to avoid silently swapping the allocator major (2->3) on opt-in Windows builds (verified the override pins 2.2.3). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/vcpkg-configuration.json | 2 +- cmake/vcpkg.json | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/cmake/vcpkg-configuration.json b/cmake/vcpkg-configuration.json index ad4be7d57c220..96b19e0b17c4c 100644 --- a/cmake/vcpkg-configuration.json +++ b/cmake/vcpkg-configuration.json @@ -2,7 +2,7 @@ "default-registry": { "kind": "git", "repository": "https://github.com/Microsoft/vcpkg", - "baseline": "120deac3062162151622ca4860575a33844ba10b" + "baseline": "22b8d099947ea6ee2fcb1aa1124b21f48f84232d" }, "overlay-ports": [ "./vcpkg-ports" diff --git a/cmake/vcpkg.json b/cmake/vcpkg.json index 9cf0185f35d77..e6674a83a7c92 100644 --- a/cmake/vcpkg.json +++ b/cmake/vcpkg.json @@ -58,7 +58,6 @@ "pybind11", "re2", "safeint", - "utf8-range", { "name": "vcpkg-cmake", "host": true @@ -115,6 +114,10 @@ { "name": "flatbuffers", "version": "23.5.26" + }, + { + "name": "mimalloc", + "version": "2.2.3" } ] } From b0eaeb75d4f966156a4a172ad4117ae4408f8a88 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 12:13:29 -0500 Subject: [PATCH 17/61] Adapt POSIX telemetry to Telemetry interface changes on current main Rebasing onto current origin/main pulled in further Telemetry interface evolution: LogSessionCreation gained an ep_versions parameter, LogEpDeviceUsage gained an ep_version parameter, and a new LogRuntimeInferenceError virtual was added. Update the PosixTelemetry overrides accordingly (emitting executionProviderVersions / epVersion) and implement LogRuntimeInferenceError as a RuntimeInferenceError event mirroring the Windows ETW schema (EP versions/device types + status + runtimeVersion). Verified on WSL/Linux: override conformance against the current base telemetry.h holds and telemetry.cc compiles -fsyntax-only against 1DS SDK v3.10.161.1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 26 ++++++++++++++++++++ onnxruntime/core/platform/posix/telemetry.h | 6 +++++ 2 files changed, 32 insertions(+) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index ce1d8dbdd770b..4953e0eab0e11 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -612,6 +612,7 @@ void PosixTelemetry::LogSessionCreation( const std::vector& execution_provider_ids, const std::string& hardware_device_types, const std::string& hardware_vendor_ids, + const std::string& ep_versions, bool use_fp16, bool captureState) const { if (!enabled_ || !logger_) { return; @@ -640,6 +641,7 @@ void PosixTelemetry::LogSessionCreation( .AddStringList("executionProviderIds", execution_provider_ids) .AddString("hardwareDeviceTypes", hardware_device_types) .AddString("hardwareVendorIds", hardware_vendor_ids) + .AddString("executionProviderVersions", ep_versions) .AddBool("useFp16", use_fp16); LogEventAsync(builder.Build()); @@ -719,6 +721,28 @@ void PosixTelemetry::LogRuntimeError( LogEventAsync(std::move(event)); } +void PosixTelemetry::LogRuntimeInferenceError(uint32_t session_id, const common::Status& status, + const std::string& ep_versions, + const std::string& ep_device_types) const { + if (!enabled_ || !logger_) { + return; + } + + auto event = EventBuilder("RuntimeInferenceError", EventPriority::HIGH, + PDT_ProductAndServicePerformance) + .AddCommonContext(this) + .AddUInt32("sessionId", session_id) + .AddInt32("errorCode", static_cast(status.Code())) + .AddInt32("errorCategory", static_cast(status.Category())) + .AddString("errorMessage", status.ErrorMessage()) + .AddString("executionProviderVersions", ep_versions) + .AddString("executionProviderDeviceTypes", ep_device_types) + .AddString("runtimeVersion", ORT_VERSION) + .Build(); + + LogEventAsync(std::move(event)); +} + void PosixTelemetry::LogRuntimePerf( uint32_t session_id, uint32_t total_runs_since_last, int64_t total_run_duration_since_last, @@ -852,6 +876,7 @@ void PosixTelemetry::LogEpDeviceUsage( uint32_t hardware_device_id, const std::string& hardware_vendor, const std::string& ep_vendor, + const std::string& ep_version, int assigned_node_count, uint32_t total_runs_since_last, int64_t total_run_duration_since_last) const { @@ -869,6 +894,7 @@ void PosixTelemetry::LogEpDeviceUsage( .AddUInt32("hardwareDeviceId", hardware_device_id) .AddString("hardwareVendor", hardware_vendor) .AddString("epVendor", ep_vendor) + .AddString("epVersion", ep_version) .AddInt32("assignedNodeCount", assigned_node_count) .AddUInt32("totalRunsSinceLast", total_runs_since_last) .AddInt64("totalRunDurationSinceLast", total_run_duration_since_last) diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 2a2aa5dc9104a..6c466c418f7b1 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -63,6 +63,7 @@ class PosixTelemetry : public Telemetry { const std::vector& execution_provider_ids, const std::string& hardware_device_types, const std::string& hardware_vendor_ids, + const std::string& ep_versions, bool use_fp16, bool captureState) const override; void LogCompileModelStart(uint32_t session_id, @@ -83,6 +84,10 @@ class PosixTelemetry : public Telemetry { void LogRuntimeError(uint32_t session_id, const common::Status& status, const char* file, const char* function, uint32_t line) const override; + void LogRuntimeInferenceError(uint32_t session_id, const common::Status& status, + const std::string& ep_versions, + const std::string& ep_device_types) const override; + void LogRuntimePerf(uint32_t session_id, uint32_t total_runs_since_last, int64_t total_run_duration_since_last, const std::unordered_map& duration_per_batch_size) const override; @@ -112,6 +117,7 @@ class PosixTelemetry : public Telemetry { uint32_t hardware_device_id, const std::string& hardware_vendor, const std::string& ep_vendor, + const std::string& ep_version, int assigned_node_count, uint32_t total_runs_since_last, int64_t total_run_duration_since_last) const override; From 500a705a397bcd1c26cf5f7634121eb978e1cdcc Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 13:24:09 -0500 Subject: [PATCH 18/61] telemetry: address Copilot review - drop stray notice, add ThirdPartyNotices.txt: remove a spurious 'Copyright (c) 2026 KleidiAi' BSD-3 block that was accidentally added before the cpp_client_telemetry notice and is unrelated to this change; keep only the 1DS SDK Apache-2.0 notice. Verified the entry now reads /_____/microsoft/cpp_client_telemetry. telemetry.h: add #include - the header uses std::string_view/std::wstring_view (LogDriverInfoEvent override) but only included , relying on transitive includes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- ThirdPartyNotices.txt | 12 ------------ onnxruntime/core/platform/posix/telemetry.h | 1 + 2 files changed, 1 insertion(+), 12 deletions(-) diff --git a/ThirdPartyNotices.txt b/ThirdPartyNotices.txt index 5a193f3f16888..7d364a9addadd 100644 --- a/ThirdPartyNotices.txt +++ b/ThirdPartyNotices.txt @@ -6120,18 +6120,6 @@ Redistribution and use in source and binary forms, with or without modification, THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -Copyright (c) 2026 KleidiAi. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - -1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. - -2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. - -3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - _____ microsoft/cpp_client_telemetry, https://github.com/microsoft/cpp_client_telemetry/ diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 6c466c418f7b1..756dcfe90421e 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -8,6 +8,7 @@ #include #include #include +#include #include // Forward declarations of 1DS SDK types From e0cbddd71b191da3084d00a6a8bf2ac4c4ea731f Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 13:40:21 -0500 Subject: [PATCH 19/61] telemetry: address Copilot round-2 - stable device-id hash, includes, cmake cleanup HashDeviceId: std::hash is implementation-defined (and may be process-salted), so anonymized device ids would not correlate across runs/platforms -> replaced with a stable FNV-1a 64-bit hash. Verified compiling against 1DS SDK v3.10.161.1 on Linux. telemetry.cc: GetLocale() calls std::getenv -> add #include (and for the FNV hash) instead of relying on transitive 1DS SDK includes. onnxruntime_1ds_telemetry.cmake: USE_1DS_TELEMETRY was added globally and also per-target -> drop the global add_compile_definitions; onnxruntime_common.cmake scopes it to onnxruntime_common (its only consumer is core/platform/posix/env.cc). onnxruntime_common.cmake: the missing-1DS-target branch warned while onnxruntime_1ds_telemetry.cmake FATAL_ERRORs on the same condition -> make it FATAL_ERROR for consistency. ThirdPartyNotices.txt: restore the canonical Apache appendix placeholder 'Copyright [yyyy] [name of copyright owner]' (was 'Copyright 2026 Microsoft Corporation'), matching the other Apache-2.0 notices. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- ThirdPartyNotices.txt | 2 +- cmake/onnxruntime_1ds_telemetry.cmake | 4 ++-- cmake/onnxruntime_common.cmake | 2 +- onnxruntime/core/platform/posix/telemetry.cc | 14 +++++++++++--- 4 files changed, 15 insertions(+), 7 deletions(-) diff --git a/ThirdPartyNotices.txt b/ThirdPartyNotices.txt index 7d364a9addadd..9f9f3515717f0 100644 --- a/ThirdPartyNotices.txt +++ b/ThirdPartyNotices.txt @@ -6316,7 +6316,7 @@ Copyright (c) Microsoft Corporation. All rights reserved. same "printed page" as the copyright notice for easier identification within third-party archives. - Copyright 2026 Microsoft Corporation + Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. diff --git a/cmake/onnxruntime_1ds_telemetry.cmake b/cmake/onnxruntime_1ds_telemetry.cmake index 2d3482aa8834d..cc552d2de131c 100644 --- a/cmake/onnxruntime_1ds_telemetry.cmake +++ b/cmake/onnxruntime_1ds_telemetry.cmake @@ -16,8 +16,8 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) message(STATUS "Enabling 1DS telemetry for non-Windows platforms") - # Add compile definition so C++ code can detect 1DS telemetry at compile time - add_compile_definitions(USE_1DS_TELEMETRY) + # USE_1DS_TELEMETRY is defined on the onnxruntime_common target in onnxruntime_common.cmake + # (its only consumer is core/platform/posix/env.cc), so it is not added globally here. # Platform-specific status messages if(APPLE) diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index eeba5d5de7aa9..70de92dbf05ab 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -252,7 +252,7 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) ) endif() else() - message(WARNING "Telemetry enabled but no 1DS SDK target ('MSTelemetry::mat' or 'mat') was found") + message(FATAL_ERROR "Telemetry enabled but no 1DS SDK target ('MSTelemetry::mat' or 'mat') was found") endif() endif() diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 4953e0eab0e11..afd5616704192 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -21,6 +21,8 @@ #include #endif +#include +#include #include #include #include @@ -177,12 +179,18 @@ class EventBuilder { EventProperties Build() { return std::move(props_); } }; -// Hash a device ID string using std::hash and format as fixed-width hex. +// Hash a device ID with a stable, platform-independent algorithm (FNV-1a 64-bit) and format as +// fixed-width hex, so the same device maps to the same anonymized id across runs and platforms. +// std::hash is implementation-defined (and may be process-salted), so it is unsuitable here. // Ensures raw device identifiers are never sent over the wire. static std::string HashDeviceId(const std::string& id) { - size_t hash = std::hash{}(id); + uint64_t hash = 14695981039346656037ULL; // FNV-1a offset basis + for (unsigned char c : id) { + hash ^= static_cast(c); + hash *= 1099511628211ULL; // FNV-1a prime + } std::ostringstream oss; - oss << std::hex << std::setfill('0') << std::setw(sizeof(size_t) * 2) << hash; + oss << std::hex << std::setfill('0') << std::setw(16) << hash; return oss.str(); } From 08deb9f15916999700dde3037d1267868ea9de11 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 13:59:30 -0500 Subject: [PATCH 20/61] telemetry: address Copilot round-3 - atomic logger, IsEnabled readiness, includes, mimalloc align telemetry.h: add #include (used in LogSessionCreation/LogRuntimePerf signatures). logger_: make the shared static logger a std::atomic, load it once in LogEventAsync, and publish it only after Initialize() fully configures it - removes the data race on the previously non-atomic pointer (Shutdown runs only at process teardown). Verified compiling against 1DS SDK v3.10.161.1 on Linux. IsEnabled(): return enabled_ && logger_ != nullptr so it reflects real readiness when initialization failed (logger_ stays null). vcpkg.json: align the mimalloc override to 2.1.1 to match cmake/deps.txt (the FetchContent pin) so both dependency paths use the same mimalloc, while still avoiding the baseline's mimalloc 3.x float. Verified the override resolves to 2.1.1 via vcpkg dry-run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/vcpkg.json | 2 +- onnxruntime/core/platform/posix/telemetry.cc | 25 +++++++++++++------- onnxruntime/core/platform/posix/telemetry.h | 3 ++- 3 files changed, 20 insertions(+), 10 deletions(-) diff --git a/cmake/vcpkg.json b/cmake/vcpkg.json index e6674a83a7c92..d56c5e157ee4b 100644 --- a/cmake/vcpkg.json +++ b/cmake/vcpkg.json @@ -117,7 +117,7 @@ }, { "name": "mimalloc", - "version": "2.2.3" + "version": "2.1.1" } ] } diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index afd5616704192..2ff5effe858cc 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -40,7 +40,7 @@ std::atomic PosixTelemetry::global_register_count_{0}; std::mutex PosixTelemetry::global_mutex_; std::mutex PosixTelemetry::mutex_; ::Microsoft::Applications::Events::ILogManager* PosixTelemetry::log_manager_ = nullptr; -::Microsoft::Applications::Events::ILogger* PosixTelemetry::logger_ = nullptr; +std::atomic<::Microsoft::Applications::Events::ILogger*> PosixTelemetry::logger_{nullptr}; std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> PosixTelemetry::config_; std::atomic PosixTelemetry::enabled_{true}; std::atomic PosixTelemetry::projection_{0}; @@ -226,8 +226,14 @@ PosixTelemetry::~PosixTelemetry() { } void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventProperties&& props) const { + // Load the shared logger once; it is an atomic pointer so this read is well-defined even if + // Shutdown() concurrently clears it (Shutdown only runs at process teardown). + auto* logger = logger_.load(std::memory_order_acquire); + if (logger == nullptr) { + return; + } try { - logger_->LogEvent(std::move(props)); + logger->LogEvent(std::move(props)); } catch (const std::exception& ex) { LOGS_DEFAULT(WARNING) << "[Telemetry] Failed to log event: " << ex.what(); } @@ -280,8 +286,8 @@ void PosixTelemetry::Initialize() { } // Get logger for our tenant - logger_ = log_manager_->GetLogger(TENANT_TOKEN); - if (!logger_) { + auto* logger = log_manager_->GetLogger(TENANT_TOKEN); + if (logger == nullptr) { LOGS_DEFAULT(WARNING) << "Failed to get telemetry logger"; LogManagerProvider::Release(*config_); log_manager_ = nullptr; @@ -315,10 +321,12 @@ void PosixTelemetry::Initialize() { } // Set application information as logger context (attached to all events) - logger_->SetContext("AppName", "ONNXRuntime"); - logger_->SetContext("AppVersion", ORT_VERSION); - logger_->SetContext("Platform", GetPlatformInfo()); + logger->SetContext("AppName", "ONNXRuntime"); + logger->SetContext("AppVersion", ORT_VERSION); + logger->SetContext("Platform", GetPlatformInfo()); + // Publish the fully-configured logger atomically; concurrent readers observe it only now. + logger_.store(logger, std::memory_order_release); enabled_ = true; } @@ -517,7 +525,8 @@ void PosixTelemetry::SetLanguageProjection(uint32_t projection) const { } bool PosixTelemetry::IsEnabled() const { - return enabled_; + // Reflect actual readiness: the opt-out flag AND a successfully-initialized logger. + return enabled_ && logger_.load(std::memory_order_acquire) != nullptr; } unsigned char PosixTelemetry::Level() const { diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 756dcfe90421e..866756463e2f3 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -9,6 +9,7 @@ #include #include #include +#include #include // Forward declarations of 1DS SDK types @@ -163,7 +164,7 @@ class PosixTelemetry : public Telemetry { // Telemetry SDK instances. // log_manager_ is owned by LogManagerProvider; logger_ is owned by log_manager_. static ::Microsoft::Applications::Events::ILogManager* log_manager_; - static ::Microsoft::Applications::Events::ILogger* logger_; + static std::atomic<::Microsoft::Applications::Events::ILogger*> logger_; // SDK configuration — must outlive log_manager_ (LogManagerImpl holds a reference). static std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> config_; From 94bf13f295e384245dc61168757b0f85fb26ef9c Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 14:15:35 -0500 Subject: [PATCH 21/61] telemetry: address Copilot round-4 - shell32 link, device-id perms, opt-in source of truth onnxruntime_common.cmake: link shell32 on Windows - windows/telemetry.cc's svchost service-name fallback calls CommandLineToArgvW (shell32.dll), which ORT's regular Windows build did not link explicitly (only the GDK toolchain did). device_id.cc: chmod the persistent device-id file to 0600 (owner read/write) so the stable identifier is not world-readable regardless of the process umask. telemetry.cc: stop force-setting enabled_=true in Initialize(); leave it to the static default / runtime EnableTelemetryEvents()/DisableTelemetryEvents() so they remain the opt-in source of truth (IsEnabled() already gates on a non-null logger). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/onnxruntime_common.cmake | 2 ++ onnxruntime/core/platform/posix/device_id.cc | 3 +++ onnxruntime/core/platform/posix/telemetry.cc | 3 ++- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index 70de92dbf05ab..3bb8d6aa4fad7 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -139,6 +139,8 @@ if(WIN32) set_property(TARGET onnxruntime_common PROPERTY CXX_STANDARD 23) target_compile_options(onnxruntime_common PRIVATE "/Zc:char8_t-") endif() + # windows/telemetry.cc's svchost service-name fallback uses CommandLineToArgvW, which lives in shell32. + target_link_libraries(onnxruntime_common PRIVATE shell32) endif() if(NOT WIN32 AND NOT APPLE AND NOT ANDROID AND CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64") diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index d6fd592478109..a41d0d0b9fb8c 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -177,6 +177,9 @@ void DeviceId::InitializeInternal() { if (outfile.good()) { outfile << device_id_; outfile.close(); + // Restrict to owner read/write (0600): this is a stable identifier and should not be + // world-readable regardless of the process umask. + chmod(file_path.c_str(), S_IRUSR | S_IWUSR); status_ = DeviceIdStatus::New; } else { status_ = DeviceIdStatus::Failed; diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 2ff5effe858cc..f3134e021865e 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -326,8 +326,9 @@ void PosixTelemetry::Initialize() { logger->SetContext("Platform", GetPlatformInfo()); // Publish the fully-configured logger atomically; concurrent readers observe it only now. + // enabled_ is left to its default / the runtime EnableTelemetryEvents()/DisableTelemetryEvents() + // opt-in state rather than being force-set here. logger_.store(logger, std::memory_order_release); - enabled_ = true; } void PosixTelemetry::Shutdown() { From 30244c1de4ad4f64d3456c8374e67d69280d4420 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 14:49:05 -0500 Subject: [PATCH 22/61] telemetry: address Copilot round-5 - silence disabled-telemetry configure log onnxruntime_1ds_telemetry.cmake: drop the unconditional 'Telemetry is disabled' STATUS message that printed on every CMake configure in the common telemetry-off case, removing CI/configure log noise. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/onnxruntime_1ds_telemetry.cmake | 4 ---- 1 file changed, 4 deletions(-) diff --git a/cmake/onnxruntime_1ds_telemetry.cmake b/cmake/onnxruntime_1ds_telemetry.cmake index cc552d2de131c..bde7956d8396c 100644 --- a/cmake/onnxruntime_1ds_telemetry.cmake +++ b/cmake/onnxruntime_1ds_telemetry.cmake @@ -31,8 +31,4 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) elseif(UNIX) message(STATUS " Platform: Linux") endif() -else() - if(NOT onnxruntime_USE_TELEMETRY) - message(STATUS "Telemetry is disabled (use -Donnxruntime_USE_TELEMETRY=ON to enable)") - endif() endif() From 1ae58b8f1accb5cbc853506dc85c829d2b985077 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 15:06:00 -0500 Subject: [PATCH 23/61] telemetry: address Copilot round-6 - exception-safe and early-init-safe logging Use ORT_TRY/ORT_CATCH/ORT_HANDLE_EXCEPTION instead of raw try/catch so telemetry.cc builds under ORT_NO_EXCEPTIONS (-fno-exceptions). Route all warnings through an ORT_TELEMETRY_WARN macro that first checks logging::LoggingManager::HasDefaultLogger(): PosixTelemetry can be constructed during early Env initialization (before a default logger is registered) and destroyed late at process exit, so an unguarded LOGS_DEFAULT could touch a missing/destroyed LoggingManager. Verified telemetry.cc compiles with exceptions on and has no errors attributed to it under -DORT_NO_EXCEPTIONS -fno-exceptions (Linux, 1DS SDK v3.10.161.1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 47 ++++++++++++++------ 1 file changed, 33 insertions(+), 14 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index f3134e021865e..14eadfcf7de74 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -27,6 +27,7 @@ #include #include +#include "core/common/common.h" #include "core/common/logging/logging.h" #include "core/common/status.h" #include "onnxruntime_config.h" @@ -35,6 +36,16 @@ using namespace Microsoft::Applications::Events; namespace onnxruntime { +// PosixTelemetry can be constructed during early Env initialization (before logging registers a +// default logger) and destroyed late at process exit, so only emit warnings when a default logger +// exists, to avoid touching a missing/destroyed LoggingManager. +#define ORT_TELEMETRY_WARN(stream_expr) \ + do { \ + if (::onnxruntime::logging::LoggingManager::HasDefaultLogger()) { \ + LOGS_DEFAULT(WARNING) << stream_expr; \ + } \ + } while (0) + // Static member initialization std::atomic PosixTelemetry::global_register_count_{0}; std::mutex PosixTelemetry::global_mutex_; @@ -201,12 +212,14 @@ PosixTelemetry::PosixTelemetry() { global_register_count_++; if (global_register_count_ == 1) { - try { + ORT_TRY { Initialize(); - } catch (const std::exception& ex) { - // Log error but don't fail construction - // Telemetry failures should not break application functionality - LOGS_DEFAULT(WARNING) << "Failed to initialize telemetry: " << ex.what(); + } + ORT_CATCH(const std::exception& ex) { + // Telemetry failures should not break application functionality. + ORT_HANDLE_EXCEPTION([&]() { + ORT_TELEMETRY_WARN("Failed to initialize telemetry: " << ex.what()); + }); } } } @@ -216,11 +229,14 @@ PosixTelemetry::~PosixTelemetry() { global_register_count_--; if (global_register_count_ == 0) { - try { + ORT_TRY { Shutdown(); - } catch (const std::exception& ex) { - // Log error but don't throw from destructor - LOGS_DEFAULT(WARNING) << "Error during telemetry shutdown: " << ex.what(); + } + ORT_CATCH(const std::exception& ex) { + // Don't throw from a destructor. + ORT_HANDLE_EXCEPTION([&]() { + ORT_TELEMETRY_WARN("Error during telemetry shutdown: " << ex.what()); + }); } } } @@ -232,10 +248,13 @@ void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventPropert if (logger == nullptr) { return; } - try { + ORT_TRY { logger->LogEvent(std::move(props)); - } catch (const std::exception& ex) { - LOGS_DEFAULT(WARNING) << "[Telemetry] Failed to log event: " << ex.what(); + } + ORT_CATCH(const std::exception& ex) { + ORT_HANDLE_EXCEPTION([&]() { + ORT_TELEMETRY_WARN("[Telemetry] Failed to log event: " << ex.what()); + }); } } @@ -280,7 +299,7 @@ void PosixTelemetry::Initialize() { status_t status; log_manager_ = LogManagerProvider::CreateLogManager(*config_, status); if (status != STATUS_SUCCESS || !log_manager_) { - LOGS_DEFAULT(WARNING) << "Failed to create telemetry LogManager, status: " << status; + ORT_TELEMETRY_WARN("Failed to create telemetry LogManager, status: " << status); config_.reset(); return; } @@ -288,7 +307,7 @@ void PosixTelemetry::Initialize() { // Get logger for our tenant auto* logger = log_manager_->GetLogger(TENANT_TOKEN); if (logger == nullptr) { - LOGS_DEFAULT(WARNING) << "Failed to get telemetry logger"; + ORT_TELEMETRY_WARN("Failed to get telemetry logger"); LogManagerProvider::Release(*config_); log_manager_ = nullptr; config_.reset(); From 996ec9c5c6edcf5fa2992c0e61dfca928a32b254 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 15:20:25 -0500 Subject: [PATCH 24/61] telemetry: address Copilot round-7 - exception-safe device_id device_id.cc: use ORT_TRY/ORT_CATCH instead of raw try/catch (and include core/common/common.h) so it builds under ORT_NO_EXCEPTIONS, matching the telemetry.cc change. Verified it compiles with exceptions on and is clean under -DORT_NO_EXCEPTIONS -fno-exceptions on Linux. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/device_id.cc | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index a41d0d0b9fb8c..2300fab1bbfa7 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -3,6 +3,8 @@ #include "core/platform/posix/device_id.h" +#include "core/common/common.h" + #include #include #include @@ -120,7 +122,7 @@ void DeviceId::InitializeInternal() { if (initialized_) return; initialized_ = true; - try { + ORT_TRY { // Use compile-time platform detection to select the appropriate storage path. // This matches the mobile/desktop selection in posix/env.cc. #if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) @@ -184,7 +186,8 @@ void DeviceId::InitializeInternal() { } else { status_ = DeviceIdStatus::Failed; } - } catch (...) { + } + ORT_CATCH(...) { status_ = DeviceIdStatus::Failed; // Keep device_id_ if generated — it's still valid for this session (in-memory only). } From 6ebaef3e9d6e8a3893ac2aded01769325e7a2507 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 15:34:54 -0500 Subject: [PATCH 25/61] telemetry: address Copilot round-8 - device-id HOME fallback for daemons device_id.cc: GetStorageDirectory now falls back to getpwuid(getuid())->pw_dir when \C:\Users\bhamehta is unset (common for system services/daemons under systemd/launchd), so the persistent device id can still be stored instead of failing. Verified compiles on Linux. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/device_id.cc | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index 2300fab1bbfa7..1d76a741dbb2d 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -13,6 +13,8 @@ #include #include +#include +#include #ifdef __APPLE__ #include @@ -92,9 +94,16 @@ bool DeviceId::IsValidGUID(const std::string& str) { } std::string DeviceId::GetStorageDirectory(bool mobile) { - const char* h = std::getenv("HOME"); - if (!h || !h[0]) return ""; - std::string home(h); + // Prefer $HOME; fall back to the password database (getpwuid) for contexts where HOME is unset, + // e.g. system services/daemons under systemd/launchd. + std::string home; + if (const char* h = std::getenv("HOME"); h != nullptr && h[0] != '\0') { + home = h; + } else if (const struct passwd* pw = ::getpwuid(::getuid()); + pw != nullptr && pw->pw_dir != nullptr && pw->pw_dir[0] != '\0') { + home = pw->pw_dir; + } + if (home.empty()) return ""; if (mobile) { return home + "/.onnxruntime"; From 41e6953b1da4f1029227d228d475b9ac32365312 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 15:43:24 -0500 Subject: [PATCH 26/61] telemetry: address Copilot round-9 - explicit atomic loads via IsEnabled() IsEnabled() now uses an explicit enabled_.load(acquire), and every per-method readiness guard ('if (!enabled_ || !logger_)') is replaced with 'if (!IsEnabled())'. This centralizes the readiness check and makes the atomic loads and memory ordering explicit instead of relying on implicit std::atomic conversions. Verified telemetry.cc compiles against the 1DS SDK on Linux. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 42 ++++++++++---------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 14eadfcf7de74..8c351f21d8a09 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -546,7 +546,7 @@ void PosixTelemetry::SetLanguageProjection(uint32_t projection) const { bool PosixTelemetry::IsEnabled() const { // Reflect actual readiness: the opt-out flag AND a successfully-initialized logger. - return enabled_ && logger_.load(std::memory_order_acquire) != nullptr; + return enabled_.load(std::memory_order_acquire) && logger_.load(std::memory_order_acquire) != nullptr; } unsigned char PosixTelemetry::Level() const { @@ -560,7 +560,7 @@ uint64_t PosixTelemetry::Keyword() const { void PosixTelemetry::LogProcessInfo() const { // LogProcessInfo only collects system metadata, but it must still honor the // runtime opt-out (DisableTelemetryEvents) like every other event. - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -589,7 +589,7 @@ void PosixTelemetry::LogProcessInfo() const { } void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -603,7 +603,7 @@ void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { } void PosixTelemetry::LogEvaluationStop(uint32_t session_id) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -620,7 +620,7 @@ void PosixTelemetry::LogEvaluationStop(uint32_t session_id) const { } void PosixTelemetry::LogEvaluationStart(uint32_t session_id) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -651,7 +651,7 @@ void PosixTelemetry::LogSessionCreation( const std::string& hardware_vendor_ids, const std::string& ep_versions, bool use_fp16, bool captureState) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -693,7 +693,7 @@ void PosixTelemetry::LogCompileModelStart( bool embed_ep_context, bool has_external_initializers_file, const std::vector& execution_provider_ids) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -719,7 +719,7 @@ void PosixTelemetry::LogCompileModelComplete( uint32_t error_code, uint32_t error_category, const std::string& error_message) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -739,7 +739,7 @@ void PosixTelemetry::LogCompileModelComplete( void PosixTelemetry::LogRuntimeError( uint32_t session_id, const common::Status& status, const char* file, const char* function, uint32_t line) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -761,7 +761,7 @@ void PosixTelemetry::LogRuntimeError( void PosixTelemetry::LogRuntimeInferenceError(uint32_t session_id, const common::Status& status, const std::string& ep_versions, const std::string& ep_device_types) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -784,7 +784,7 @@ void PosixTelemetry::LogRuntimePerf( uint32_t session_id, uint32_t total_runs_since_last, int64_t total_run_duration_since_last, const std::unordered_map& duration_per_batch_size) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -819,7 +819,7 @@ void PosixTelemetry::LogAutoEpSelection( uint32_t session_id, const std::string& selection_policy, const std::vector& requested_execution_provider_ids, const std::vector& available_execution_provider_ids) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -839,7 +839,7 @@ void PosixTelemetry::LogProviderOptions( const std::string& provider_id, const std::string& provider_options_string, bool captureState) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -856,7 +856,7 @@ void PosixTelemetry::LogProviderOptions( } void PosixTelemetry::LogModelLoadStart(uint32_t session_id) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -870,7 +870,7 @@ void PosixTelemetry::LogModelLoadStart(uint32_t session_id) const { } void PosixTelemetry::LogModelLoadEnd(uint32_t session_id, const common::Status& status) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -888,7 +888,7 @@ void PosixTelemetry::LogModelLoadEnd(uint32_t session_id, const common::Status& } void PosixTelemetry::LogSessionCreationEnd(uint32_t session_id, const common::Status& status) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -917,7 +917,7 @@ void PosixTelemetry::LogEpDeviceUsage( int assigned_node_count, uint32_t total_runs_since_last, int64_t total_run_duration_since_last) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -941,7 +941,7 @@ void PosixTelemetry::LogEpDeviceUsage( } void PosixTelemetry::LogRegisterEpLibraryStart(const std::string& registration_name) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -956,7 +956,7 @@ void PosixTelemetry::LogRegisterEpLibraryStart(const std::string& registration_n void PosixTelemetry::LogRegisterEpLibraryEnd(const std::string& registration_name, const common::Status& status) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -975,7 +975,7 @@ void PosixTelemetry::LogRegisterEpLibraryEnd(const std::string& registration_nam void PosixTelemetry::LogRegisterEpLibraryWithLibPath(const std::string& registration_name, const std::string& lib_path) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } @@ -990,7 +990,7 @@ void PosixTelemetry::LogRegisterEpLibraryWithLibPath(const std::string& registra } void PosixTelemetry::LogSystemMetrics(uint32_t session_id) const { - if (!enabled_ || !logger_) { + if (!IsEnabled()) { return; } From 2933b7e2a9e78314f8034a04457539d735ced733 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 16:13:01 -0500 Subject: [PATCH 27/61] telemetry: make 1DS tenant token overridable at compile time The POSIX 1DS tenant token was hard-coded in core/platform/posix/telemetry.cc. Keep the in-source value as an intentional throwaway default (so DIY builds get working telemetry and it can simply be revoked if abused), but let official pipelines inject the real token from a secret at compile time: - telemetry.cc: TENANT_TOKEN now derives from an ORT_1DS_TENANT_TOKEN macro, defaulting to the throwaway token via ifndef. - cmake/CMakeLists.txt: new onnxruntime_1DS_TENANT_TOKEN cache STRING (empty by default). - cmake/onnxruntime_common.cmake: when that variable is set, adds an ORT_1DS_TENANT_TOKEN quoted-string PRIVATE compile definition on onnxruntime_common. Verified with a real CMake configure: setting onnxruntime_1DS_TENANT_TOKEN emits the correctly-quoted -D flag and the built binary resolves the injected token; the default build keeps the throwaway token. telemetry.cc compiles in both modes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/CMakeLists.txt | 4 ++++ cmake/onnxruntime_common.cmake | 5 +++++ onnxruntime/core/platform/posix/telemetry.cc | 13 +++++++++++-- 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt index 8523b45141ec8..ea92be98c5468 100644 --- a/cmake/CMakeLists.txt +++ b/cmake/CMakeLists.txt @@ -143,6 +143,10 @@ option(onnxruntime_USE_WINML "Build with WinML support" OFF) option(onnxruntime_USE_ACL "Build with ACL support" OFF) option(onnxruntime_ENABLE_INSTRUMENT "Enable Instrument with Event Tracing for Windows (ETW)" OFF) option(onnxruntime_USE_TELEMETRY "Build with Telemetry" OFF) +# Optional 1DS ingestion token for non-Windows telemetry. Leave empty to compile in the default +# throwaway token from core/platform/posix/telemetry.cc; official pipelines override it at compile +# time by setting this to the real token (injected from a pipeline secret). +set(onnxruntime_1DS_TENANT_TOKEN "" CACHE STRING "Override the compiled-in 1DS telemetry ingestion token (non-Windows)") cmake_dependent_option(onnxruntime_USE_MIMALLOC "Override new/delete and arena allocator with mimalloc" OFF "WIN32;NOT onnxruntime_USE_CUDA;NOT onnxruntime_USE_OPENVINO" OFF) option(onnxruntime_USE_CANN "Build with CANN support" OFF) option(onnxruntime_USE_XNNPACK "Build with XNNPACK support. Provides an alternative math library on ARM, WebAssembly and x86." OFF) diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index 3bb8d6aa4fad7..cb6be7e98fc5a 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -155,6 +155,11 @@ if (onnxruntime_USE_TELEMETRY) set_target_properties(onnxruntime_common PROPERTIES COMPILE_FLAGS "/FI${ONNXRUNTIME_INCLUDE_DIR}/core/platform/windows/TraceLoggingConfigPrivate.h") else() target_compile_definitions(onnxruntime_common PRIVATE USE_1DS_TELEMETRY) + if(onnxruntime_1DS_TENANT_TOKEN) + # Official builds inject the real ingestion token (from a pipeline secret) at compile time, + # overriding the default throwaway token defined in core/platform/posix/telemetry.cc. + target_compile_definitions(onnxruntime_common PRIVATE ORT_1DS_TENANT_TOKEN="${onnxruntime_1DS_TENANT_TOKEN}") + endif() endif() endif() if (onnxruntime_USE_MIMALLOC) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 8c351f21d8a09..8367ce9208409 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -60,8 +60,17 @@ std::atomic PosixTelemetry::keyword_{0}; std::atomic PosixTelemetry::process_info_logged_{false}; std::atomic PosixTelemetry::system_metrics_sample_counter_{0}; -// Tenant token for 1DS telemetry ingestion -constexpr const char* TENANT_TOKEN = "5ad963bd4b3a4118a481401cc0211875-da8e8657-47d4-4ed7-ab39-7886e136f53b-6988"; +// Tenant token for 1DS telemetry ingestion. +// +// The default below is a throwaway ingestion key so that anyone building ONNX Runtime themselves +// gets working telemetry by default; it carries no secret and can simply be revoked if abused. +// Official builds override it at compile time by defining ORT_1DS_TENANT_TOKEN (injected from a +// pipeline secret via the onnxruntime_1DS_TENANT_TOKEN CMake variable), so the production token is +// never committed to source. +#ifndef ORT_1DS_TENANT_TOKEN +#define ORT_1DS_TENANT_TOKEN "5ad963bd4b3a4118a481401cc0211875-da8e8657-47d4-4ed7-ab39-7886e136f53b-6988" +#endif +constexpr const char* TENANT_TOKEN = ORT_1DS_TENANT_TOKEN; // Event priority mapping (1DS priorities) enum class EventPriority { From 7bcc5764a09da0e9a725ec4529034819a23d8b81 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 16:28:49 -0500 Subject: [PATCH 28/61] telemetry: fix getpwuid thread-safety and LogEventAsync/Shutdown UAF (round 11) device_id.cc: GetStorageDirectory used getpwuid(), which returns a pointer to shared static storage and is not thread-safe. Switch to the reentrant getpwuid_r() with a sysconf(_SC_GETPW_R_SIZE_MAX)-sized buffer. telemetry.cc/.h: LogEventAsync loaded the atomic logger_ and called logger->LogEvent() without guarding against teardown, so a concurrent Shutdown() could FlushAndTeardown + Release the owning log_manager_ between the load and the call (use-after-free). Make mutex_ a std::shared_mutex: LogEventAsync now holds a shared (reader) lock for the LogEvent call while Initialize()/Shutdown() take it exclusively, so teardown waits for in-flight logs while concurrent logging still proceeds in parallel. Verified: both files compile (exceptions + ORT_NO_EXCEPTIONS) and are clang-format clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/device_id.cc | 15 ++++++++++++--- onnxruntime/core/platform/posix/telemetry.cc | 12 +++++++----- onnxruntime/core/platform/posix/telemetry.h | 3 ++- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index 1d76a741dbb2d..ca991f5b43fff 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -99,9 +100,17 @@ std::string DeviceId::GetStorageDirectory(bool mobile) { std::string home; if (const char* h = std::getenv("HOME"); h != nullptr && h[0] != '\0') { home = h; - } else if (const struct passwd* pw = ::getpwuid(::getuid()); - pw != nullptr && pw->pw_dir != nullptr && pw->pw_dir[0] != '\0') { - home = pw->pw_dir; + } else { + // getpwuid() returns a pointer to shared static storage and is not thread-safe; use the + // reentrant getpwuid_r() with a caller-provided buffer so concurrent callers don't race. + struct passwd pwd; + struct passwd* result = nullptr; + const long sc = ::sysconf(_SC_GETPW_R_SIZE_MAX); + std::vector buf(sc > 0 ? static_cast(sc) : 16384); + if (::getpwuid_r(::getuid(), &pwd, buf.data(), buf.size(), &result) == 0 && + result != nullptr && result->pw_dir != nullptr && result->pw_dir[0] != '\0') { + home = result->pw_dir; + } } if (home.empty()) return ""; diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 8367ce9208409..088a4b75c0386 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -49,7 +49,7 @@ namespace onnxruntime { // Static member initialization std::atomic PosixTelemetry::global_register_count_{0}; std::mutex PosixTelemetry::global_mutex_; -std::mutex PosixTelemetry::mutex_; +std::shared_mutex PosixTelemetry::mutex_; ::Microsoft::Applications::Events::ILogManager* PosixTelemetry::log_manager_ = nullptr; std::atomic<::Microsoft::Applications::Events::ILogger*> PosixTelemetry::logger_{nullptr}; std::unique_ptr<::Microsoft::Applications::Events::ILogConfiguration> PosixTelemetry::config_; @@ -251,8 +251,10 @@ PosixTelemetry::~PosixTelemetry() { } void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventProperties&& props) const { - // Load the shared logger once; it is an atomic pointer so this read is well-defined even if - // Shutdown() concurrently clears it (Shutdown only runs at process teardown). + // Hold a shared (reader) lock for the duration of the LogEvent call so the logger and its owning + // log manager cannot be torn down underneath us: Initialize()/Shutdown() take this lock + // exclusively. The shared lock still allows multiple threads to log concurrently. + std::shared_lock lock(mutex_); auto* logger = logger_.load(std::memory_order_acquire); if (logger == nullptr) { return; @@ -268,7 +270,7 @@ void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventPropert } void PosixTelemetry::Initialize() { - std::lock_guard lock(mutex_); + std::unique_lock lock(mutex_); // NOTE: On Android, the Java layer must be initialized before calling this: // System.loadLibrary("maesdk"); @@ -360,7 +362,7 @@ void PosixTelemetry::Initialize() { } void PosixTelemetry::Shutdown() { - std::lock_guard lock(mutex_); + std::unique_lock lock(mutex_); // Disable logging first to prevent new events during shutdown enabled_ = false; diff --git a/onnxruntime/core/platform/posix/telemetry.h b/onnxruntime/core/platform/posix/telemetry.h index 866756463e2f3..c68eebf83fa26 100644 --- a/onnxruntime/core/platform/posix/telemetry.h +++ b/onnxruntime/core/platform/posix/telemetry.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -159,7 +160,7 @@ class PosixTelemetry : public Telemetry { // single owner regardless of how many PosixTelemetry objects exist. // Mutex for thread-safe init/shutdown of the shared SDK state. - static std::mutex mutex_; + static std::shared_mutex mutex_; // Telemetry SDK instances. // log_manager_ is owned by LogManagerProvider; logger_ is owned by log_manager_. From f1dede608acc1e7ff0de1057f2753f59bdc78d33 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 16:39:05 -0500 Subject: [PATCH 29/61] telemetry: log only the basename of __FILE__ in LogRuntimeError (round 12) LogRuntimeError forwarded the caller's __FILE__ (e.g. global_api.cc passes __FILE__) into the remote-uploaded "file" telemetry field. __FILE__ is frequently an absolute build path that embeds the developer/build-machine username, so uploading it leaks local paths. Emit only the basename (strip up to the last '/') via std::string_view; the source file name is still captured for debugging. Verified telemetry.cc compiles and is clang-format clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 088a4b75c0386..3500422e030b2 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -754,6 +754,13 @@ void PosixTelemetry::LogRuntimeError( return; } + // __FILE__ may be an absolute build path that embeds developer/build directory names; emit only + // the basename so remote telemetry doesn't leak usernames or local paths. + std::string_view file_view = file ? std::string_view{file} : std::string_view{}; + if (const size_t slash = file_view.find_last_of('/'); slash != std::string_view::npos) { + file_view.remove_prefix(slash + 1); + } + auto event = EventBuilder("RuntimeError", EventPriority::HIGH, PDT_ProductAndServicePerformance) .AddCommonContext(this) @@ -761,7 +768,7 @@ void PosixTelemetry::LogRuntimeError( .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) .AddString("errorMessage", status.ErrorMessage()) - .AddString("file", file ? file : "") + .AddString("file", std::string(file_view)) .AddString("function", function ? function : "") .AddUInt32("line", line) .Build(); From 50b12e0f02f51482166e7753bbee282f1e8f957e Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 16:49:17 -0500 Subject: [PATCH 30/61] telemetry: harden device-id persistence - 0600-on-create + preserve Corrupted (round 13) device_id.cc: - The device-id file was created by std::ofstream (umask-derived perms) and only chmod'd to 0600 afterwards, leaving a brief window where the written id could be world-readable. Create the file via open(O_CREAT, 0600) and fchmod before writing, so permissions are owner-only before any content is written (fchmod also tightens a pre-existing file). - A regenerated-from-corruption id set status_=Corrupted during the read, then overwrote it with New on a successful write, so callers/telemetry never observed corruption. DeviceIdStatus::Corrupted is documented as "invalid and regenerated", so preserve it. Verified device_id.cc compiles (exceptions + ORT_NO_EXCEPTIONS) and is clang-format clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/device_id.cc | 27 +++++++++++++------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index ca991f5b43fff..98edd6e79a859 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -16,6 +16,7 @@ #include #include #include +#include #ifdef __APPLE__ #include @@ -192,15 +193,23 @@ void DeviceId::InitializeInternal() { // Create directory tree CreateDirectoryTree(dir_path); - // Write to file - std::ofstream outfile(file_path); - if (outfile.good()) { - outfile << device_id_; - outfile.close(); - // Restrict to owner read/write (0600): this is a stable identifier and should not be - // world-readable regardless of the process umask. - chmod(file_path.c_str(), S_IRUSR | S_IWUSR); - status_ = DeviceIdStatus::New; + // Persist with owner-only (0600) permissions from creation. Using open() with mode 0600 (and + // fchmod to also tighten a pre-existing file) avoids the window where std::ofstream would create + // the file using the process umask and only chmod it afterwards — during which the device id + // could briefly be world-readable. fchmod runs before any write, so content is never exposed. + const bool regenerated_from_corruption = (status_ == DeviceIdStatus::Corrupted); + const int fd = ::open(file_path.c_str(), O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR); + if (fd >= 0) { + ::fchmod(fd, S_IRUSR | S_IWUSR); + const ssize_t written = ::write(fd, device_id_.data(), device_id_.size()); + ::close(fd); + if (written == static_cast(device_id_.size())) { + // Preserve Corrupted (defined as "invalid and regenerated") instead of overwriting it with + // New, so callers/telemetry can still observe that the persisted id had to be regenerated. + status_ = regenerated_from_corruption ? DeviceIdStatus::Corrupted : DeviceIdStatus::New; + } else { + status_ = DeviceIdStatus::Failed; + } } else { status_ = DeviceIdStatus::Failed; } From 9d100ae691ae3642c9db5a98da5e668a5cad6d99 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 17:41:59 -0500 Subject: [PATCH 31/61] telemetry: don't latch enabled_ on Shutdown; inject token via generated header (round 14) telemetry.cc: Shutdown() set the static enabled_=false, permanently disabling telemetry for any later Initialize() of a new PosixTelemetry instance (tests / dynamic load-unload). Drop it - the shared lock + logger_=nullptr already prevent events during/after teardown, and enabled_ now reflects only the user's Enable/DisableTelemetryEvents() opt-in state. Tenant-token injection: passing the override via target_compile_definitions put the token on the compiler command line (compile_commands.json / build logs). CMake now writes the optional onnxruntime_1DS_TENANT_TOKEN into a generated header (onnxruntime_telemetry_tenant_token.h) in the build tree, which telemetry.cc includes; DIY builds generate an empty header and fall back to the throwaway default. Verified: with a token set, compile_commands.json no longer contains it and the binary resolves the injected value; empty -> throwaway default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/onnxruntime_common.cmake | 16 +++++++++++++--- cmake/onnxruntime_telemetry_tenant_token.h.in | 12 ++++++++++++ onnxruntime/core/platform/posix/telemetry.cc | 16 +++++++++++----- 3 files changed, 36 insertions(+), 8 deletions(-) create mode 100644 cmake/onnxruntime_telemetry_tenant_token.h.in diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index cb6be7e98fc5a..9bab3a2b71773 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -155,11 +155,21 @@ if (onnxruntime_USE_TELEMETRY) set_target_properties(onnxruntime_common PROPERTIES COMPILE_FLAGS "/FI${ONNXRUNTIME_INCLUDE_DIR}/core/platform/windows/TraceLoggingConfigPrivate.h") else() target_compile_definitions(onnxruntime_common PRIVATE USE_1DS_TELEMETRY) + # The optional tenant-token override is emitted into a generated header in the build tree rather + # than onto the compiler command line, so an injected token (sourced from a CI secret) does not + # leak into compile_commands.json or build logs. DIY builds leave onnxruntime_1DS_TENANT_TOKEN + # empty, so the header defines nothing and telemetry.cc uses its throwaway default. if(onnxruntime_1DS_TENANT_TOKEN) - # Official builds inject the real ingestion token (from a pipeline secret) at compile time, - # overriding the default throwaway token defined in core/platform/posix/telemetry.cc. - target_compile_definitions(onnxruntime_common PRIVATE ORT_1DS_TENANT_TOKEN="${onnxruntime_1DS_TENANT_TOKEN}") + set(ONNXRUNTIME_1DS_TENANT_TOKEN_DEFINE "#define ORT_1DS_TENANT_TOKEN \"${onnxruntime_1DS_TENANT_TOKEN}\"") + else() + set(ONNXRUNTIME_1DS_TENANT_TOKEN_DEFINE "") endif() + set(_ort_telemetry_gen_dir "${CMAKE_CURRENT_BINARY_DIR}/onnxruntime_telemetry") + configure_file( + "${REPO_ROOT}/cmake/onnxruntime_telemetry_tenant_token.h.in" + "${_ort_telemetry_gen_dir}/onnxruntime_telemetry_tenant_token.h" + @ONLY) + target_include_directories(onnxruntime_common PRIVATE "${_ort_telemetry_gen_dir}") endif() endif() if (onnxruntime_USE_MIMALLOC) diff --git a/cmake/onnxruntime_telemetry_tenant_token.h.in b/cmake/onnxruntime_telemetry_tenant_token.h.in new file mode 100644 index 0000000000000..3facccece89b1 --- /dev/null +++ b/cmake/onnxruntime_telemetry_tenant_token.h.in @@ -0,0 +1,12 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. +// +// Generated by CMake from cmake/onnxruntime_telemetry_tenant_token.h.in. Do not edit. +// +// When the official build sets onnxruntime_1DS_TENANT_TOKEN (sourced from a CI secret), CMake emits +// the corresponding ORT_1DS_TENANT_TOKEN #define here, in the build tree, instead of passing the +// token on the compiler command line. This keeps the token out of compile_commands.json and build +// logs. DIY/OSS builds leave the variable empty, so this header defines nothing and +// core/platform/posix/telemetry.cc falls back to its in-source throwaway default. +#pragma once +@ONNXRUNTIME_1DS_TENANT_TOKEN_DEFINE@ diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 3500422e030b2..95c0fdd2a1736 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -31,6 +31,9 @@ #include "core/common/logging/logging.h" #include "core/common/status.h" #include "onnxruntime_config.h" +// Optional compile-time tenant-token override, generated by CMake into the build tree (keeps an +// injected token off the compiler command line). See TENANT_TOKEN below. +#include "onnxruntime_telemetry_tenant_token.h" using namespace Microsoft::Applications::Events; @@ -64,9 +67,10 @@ std::atomic PosixTelemetry::system_metrics_sample_counter_{0}; // // The default below is a throwaway ingestion key so that anyone building ONNX Runtime themselves // gets working telemetry by default; it carries no secret and can simply be revoked if abused. -// Official builds override it at compile time by defining ORT_1DS_TENANT_TOKEN (injected from a -// pipeline secret via the onnxruntime_1DS_TENANT_TOKEN CMake variable), so the production token is -// never committed to source. +// Official builds override it via the onnxruntime_1DS_TENANT_TOKEN CMake variable (sourced from a CI +// secret), which CMake writes into the generated onnxruntime_telemetry_tenant_token.h included above +// — in the build tree, not on the compiler command line — so the production token is never committed +// to source and stays out of compile_commands.json / build logs. #ifndef ORT_1DS_TENANT_TOKEN #define ORT_1DS_TENANT_TOKEN "5ad963bd4b3a4118a481401cc0211875-da8e8657-47d4-4ed7-ab39-7886e136f53b-6988" #endif @@ -364,8 +368,10 @@ void PosixTelemetry::Initialize() { void PosixTelemetry::Shutdown() { std::unique_lock lock(mutex_); - // Disable logging first to prevent new events during shutdown - enabled_ = false; + // Clear the logger so concurrent LogEventAsync() readers (which take the shared lock) observe + // nullptr and skip. enabled_ is intentionally left untouched: it reflects the user's + // EnableTelemetryEvents()/DisableTelemetryEvents() opt-in state, so a later Initialize() (e.g. in + // tests or dynamic load/unload of the last instance) can resume telemetry without a forced re-enable. logger_ = nullptr; // Owned by log_manager_, will be destroyed with it if (log_manager_ && config_) { From 55ecae7e9dbdb2b51c29402153aab18a81b78c2b Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 17:51:25 -0500 Subject: [PATCH 32/61] telemetry: don't depend on UINT32_MAX transitive include in device_id (round 15) device_id.cc used UINT32_MAX (and uint32_t/uint16_t) without including , relying on transitive includes that aren't guaranteed across toolchains. Add and default-construct the uniform_int_distribution (same [0, max] range) so the UUID generator no longer depends on the UINT32_MAX macro being visible. Verified device_id.cc compiles and is clang-format clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/device_id.cc | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index 98edd6e79a859..406d43f7d0899 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -59,7 +60,9 @@ std::string DeviceId::GetStatusString() { std::string DeviceId::GenerateUUID() { std::random_device rd; std::mt19937 gen(rd()); - std::uniform_int_distribution dist(0, UINT32_MAX); + // Default-constructed distribution covers the full [0, uint32_t max] range without relying on a + // max-value macro being transitively included. + std::uniform_int_distribution dist; uint32_t data1 = dist(gen); uint16_t data2 = static_cast(dist(gen) & 0xFFFF); From 86991d465e54bc82b28a91c7199e0d8c6f87b7b7 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 18:07:07 -0500 Subject: [PATCH 33/61] telemetry: device-id dir 0700; clarify --use_telemetry help text (round 16) device_id.cc: CreateDirectoryTree created the device-id/telemetry-cache directory tree with 0755 (listable/traversable by other users). Use 0700 so the tree holding the persistent device id and the offline cache is owner-private, consistent with the 0600 device-id file. mkdir only affects directories it actually creates. build_args.py: --use_telemetry help said "1DS on other platforms", implying all non-Windows targets. Clarify to "Linux, macOS, Android, and iOS" since Emscripten/UWP are excluded. Verified device_id.cc compiles + clang-format clean; build_args.py is ruff format + lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/device_id.cc | 5 ++++- tools/ci_build/build_args.py | 4 +++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index 406d43f7d0899..5ce78d1f61cb9 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -137,7 +137,10 @@ void DeviceId::CreateDirectoryTree(const std::string& path) { CreateDirectoryTree(path.substr(0, pos)); } - mkdir(path.c_str(), 0755); + // Owner-only (0700): this tree holds the persistent device id and the telemetry offline cache, so + // it should not be listable/traversable by other users. mkdir only sets the mode for directories + // it actually creates; pre-existing directories are left untouched. + mkdir(path.c_str(), 0700); } void DeviceId::InitializeInternal() { diff --git a/tools/ci_build/build_args.py b/tools/ci_build/build_args.py index 9241d253facd1..a42273138662d 100644 --- a/tools/ci_build/build_args.py +++ b/tools/ci_build/build_args.py @@ -870,7 +870,9 @@ def add_other_feature_args(parser: argparse.ArgumentParser) -> None: ) # Telemetry arguments (cross-platform) parser.add_argument( - "--use_telemetry", action="store_true", help="Enable telemetry (ETW on Windows, 1DS on other platforms)." + "--use_telemetry", + action="store_true", + help="Enable telemetry (ETW on Windows; 1DS on Linux, macOS, Android, and iOS).", ) From 40449d47bcd13d78f00684f6eccc791da9408f05 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 18:20:43 -0500 Subject: [PATCH 34/61] telemetry: remove unused include from env.cc (round 17) posix/env.cc guarded an #include behind __APPLE__, but no TARGET_OS_* macro is used in this translation unit. Remove the dead Apple-specific include. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/env.cc | 4 ---- 1 file changed, 4 deletions(-) diff --git a/onnxruntime/core/platform/posix/env.cc b/onnxruntime/core/platform/posix/env.cc index 557e56863c877..4c75abab50d2d 100644 --- a/onnxruntime/core/platform/posix/env.cc +++ b/onnxruntime/core/platform/posix/env.cc @@ -16,10 +16,6 @@ limitations under the License. #include "core/platform/env.h" -#ifdef __APPLE__ -#include -#endif - #ifdef USE_1DS_TELEMETRY #include "core/platform/posix/telemetry.h" #endif From 075e8cd78fcdc37e04f882b32d62c7cff62ec839 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 18:31:26 -0500 Subject: [PATCH 35/61] telemetry: create offline-cache directory before CreateLogManager (round 18) Initialize() set CFG_STR_CACHE_FILE_PATH to a file under DeviceId::GetStorageDirectory() and then called CreateLogManager(), but the directory was only created later when the device id was first read. On a fresh machine the 1DS SDK could fail to open the offline cache DB. Add a public DeviceId::EnsureStorageDirectory() that creates the 0700 tree, and use it for the cache path so the directory exists before the SDK opens the DB. Verified device_id.cc and telemetry.cc compile and are clang-format clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/device_id.cc | 8 ++++++++ onnxruntime/core/platform/posix/device_id.h | 5 +++++ onnxruntime/core/platform/posix/telemetry.cc | 2 +- 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index 5ce78d1f61cb9..3de994899949e 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -129,6 +129,14 @@ std::string DeviceId::GetStorageDirectory(bool mobile) { #endif } +std::string DeviceId::EnsureStorageDirectory(bool mobile) { + std::string dir = GetStorageDirectory(mobile); + if (!dir.empty()) { + CreateDirectoryTree(dir); + } + return dir; +} + void DeviceId::CreateDirectoryTree(const std::string& path) { if (path.empty()) return; diff --git a/onnxruntime/core/platform/posix/device_id.h b/onnxruntime/core/platform/posix/device_id.h index 89cbd0945e045..687b6c45fcaab 100644 --- a/onnxruntime/core/platform/posix/device_id.h +++ b/onnxruntime/core/platform/posix/device_id.h @@ -43,6 +43,11 @@ class DeviceId { // Mobile: ~/.onnxruntime static std::string GetStorageDirectory(bool mobile = false); + // Same as GetStorageDirectory(), but also creates the directory tree (0700) if it does not exist. + // Returns "" if no suitable location is available. Use before writing into the directory (e.g. the + // telemetry offline cache, which the 1DS SDK opens during initialization). + static std::string EnsureStorageDirectory(bool mobile = false); + private: DeviceId() = default; ~DeviceId() = default; diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 95c0fdd2a1736..617c04e41c55e 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -299,7 +299,7 @@ void PosixTelemetry::Initialize() { #else constexpr bool is_mobile = false; #endif - std::string cache_dir = DeviceId::GetStorageDirectory(is_mobile); + std::string cache_dir = DeviceId::EnsureStorageDirectory(is_mobile); if (!cache_dir.empty()) { std::string cache_path = cache_dir + "/telemetry_cache.db"; config[CFG_STR_CACHE_FILE_PATH] = cache_path; From 224b93eae4c760b0a6185f6942c86cfa4628f7d7 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 18:44:32 -0500 Subject: [PATCH 36/61] telemetry: guard CommandLineToArgvW behind WINAPI_PARTITION_DESKTOP (round 19) windows/telemetry.cc's svchost service-name fallback uses CommandLineToArgvW (shell32), which is only available on the Windows desktop partition. Guard the include and the body of GetServiceNamesFromCommandLine() with WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP); on non-desktop partitions (UWP/GDK app) the function returns empty, since the svchost -s/-k convention doesn't apply there. Verified both partitions compile with the Windows SDK (cl.exe). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/windows/telemetry.cc | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/onnxruntime/core/platform/windows/telemetry.cc b/onnxruntime/core/platform/windows/telemetry.cc index e31867f18854f..efd25ab23f1e1 100644 --- a/onnxruntime/core/platform/windows/telemetry.cc +++ b/onnxruntime/core/platform/windows/telemetry.cc @@ -2,8 +2,11 @@ // Licensed under the MIT License. #include "core/platform/windows/telemetry.h" +#include #include +#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP) #include +#endif #include #include #include @@ -79,6 +82,7 @@ std::string ConvertWideStringToUtf8(const std::wstring& wide) { // Parse the command line for -s (service name) and -k (service group) arguments. // These are svchost.exe conventions and may not be present for all services. std::string GetServiceNamesFromCommandLine() { +#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP) LPCWSTR cmd_line = ::GetCommandLineW(); if (cmd_line == nullptr) return {}; @@ -103,6 +107,11 @@ std::string GetServiceNamesFromCommandLine() { ::LocalFree(argv); return ConvertWideStringToUtf8(aggregated); +#else + // CommandLineToArgvW lives in shell32 and is only available on the desktop partition; the + // svchost -s/-k service-name convention does not apply on non-desktop Windows (UWP/GDK). + return {}; +#endif } std::string GetServiceNamesForCurrentProcess() { From 62ac02aa388759ca5b60b8bea4dfda7687e0ccda Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 18:55:44 -0500 Subject: [PATCH 37/61] telemetry: restrict shell32 link to desktop Windows (round 20) The explicit shell32 link (added for windows/telemetry.cc's CommandLineToArgvW svchost fallback) was applied to all WIN32 builds. shell32 is only used on the desktop partition (the call is now guarded with WINAPI_PARTITION_DESKTOP), and GDK lists shell32.lib in nodefault_libs (excluded via /NODEFAULTLIB) while UWP/WindowsStore doesn't ship it. Guard the link with NOT GDK_PLATFORM AND NOT CMAKE_SYSTEM_NAME STREQUAL WindowsStore so it only applies to desktop Windows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/onnxruntime_common.cmake | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index 9bab3a2b71773..093f105abbb4d 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -139,8 +139,13 @@ if(WIN32) set_property(TARGET onnxruntime_common PROPERTY CXX_STANDARD 23) target_compile_options(onnxruntime_common PRIVATE "/Zc:char8_t-") endif() - # windows/telemetry.cc's svchost service-name fallback uses CommandLineToArgvW, which lives in shell32. - target_link_libraries(onnxruntime_common PRIVATE shell32) + # windows/telemetry.cc's svchost service-name fallback uses CommandLineToArgvW (shell32), which is + # only compiled on the desktop partition (guarded with WINAPI_PARTITION_DESKTOP there). Restrict the + # explicit shell32 link to desktop Windows: GDK lists shell32.lib in nodefault_libs (excluded via + # /NODEFAULTLIB), and non-desktop partitions (UWP/WindowsStore) neither use nor ship it. + if(NOT GDK_PLATFORM AND NOT CMAKE_SYSTEM_NAME STREQUAL "WindowsStore") + target_link_libraries(onnxruntime_common PRIVATE shell32) + endif() endif() if(NOT WIN32 AND NOT APPLE AND NOT ANDROID AND CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64") From 4a25375f97f56bacc48c3548f325e46fd80f8144 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 19:10:09 -0500 Subject: [PATCH 38/61] telemetry: per-event schemaVersion + document vcpkg baseline bump (round 21) telemetry.cc: EventBuilder hard-coded schemaVersion=0 for every event, but the Windows provider uses per-event versions. Add EventBuilder::SetSchemaVersion() and set the versions that match WindowsTelemetry: SessionCreationStart=2 (includes executionProviderVersions), and CompileModelStart/ RuntimeError/RuntimePerf/ModelLoadStart/EpDeviceUsage/RegisterEpLibraryStart=1. Other events stay at 0 (their Windows counterpart is 0 or they are POSIX-only), so downstream processing sees consistent schema versions across platforms. vcpkg-configuration.json: add a $comment documenting why the baseline was bumped (to pick up the cpp-client-telemetry port; heavy deps are overlay/override-pinned). Verified vcpkg tolerates the field and resolves the port via 'vcpkg install --dry-run'. Verified telemetry.cc compiles + clang-format clean; vcpkg-configuration.json is valid JSON. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/vcpkg-configuration.json | 1 + onnxruntime/core/platform/posix/telemetry.cc | 16 +++++++++++++++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/cmake/vcpkg-configuration.json b/cmake/vcpkg-configuration.json index 96b19e0b17c4c..9c483fe9c2c5a 100644 --- a/cmake/vcpkg-configuration.json +++ b/cmake/vcpkg-configuration.json @@ -1,4 +1,5 @@ { + "$comment": "Baseline bumped to pick up the cpp-client-telemetry port (POSIX 1DS telemetry), which did not exist in the prior baseline. ONNX Runtime's heavy dependencies are pinned via overlay-ports/overrides (see cmake/vcpkg.json), so this bump only floats benign minor versions; impact was verified with 'vcpkg install --dry-run'.", "default-registry": { "kind": "git", "repository": "https://github.com/Microsoft/vcpkg", diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 617c04e41c55e..cc4f695c7d6af 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -100,7 +100,8 @@ class EventBuilder { // Set latency/priority props_.SetLatency(static_cast(priority)); - // Set schema version for compatibility with Windows + // Default schemaVersion is 0; events that have evolved call SetSchemaVersion() to match the + // Windows provider's per-event versions. props_.SetProperty("schemaVersion", static_cast(0)); // All ORT telemetry is required system metadata (no PII) @@ -110,6 +111,12 @@ class EventBuilder { props_.SetProperty(COMMONFIELDS_EVENT_PRIVTAGS, static_cast(privacy_tags)); } + // Override the default schemaVersion (0) to match the Windows provider's per-event versions. + EventBuilder& SetSchemaVersion(uint8_t schema_version) { + props_.SetProperty("schemaVersion", static_cast(schema_version)); + return *this; + } + EventBuilder& AddString(const char* key, const std::string& value) { if (!value.empty()) { props_.SetProperty(key, value); @@ -612,6 +619,7 @@ void PosixTelemetry::LogSessionCreationStart(uint32_t session_id) const { auto event = EventBuilder("SessionCreationStart", EventPriority::CRITICAL, PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) + .SetSchemaVersion(2) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .Build(); @@ -716,6 +724,7 @@ void PosixTelemetry::LogCompileModelStart( auto event = EventBuilder("CompileModelStart", EventPriority::NORMAL, PDT_SoftwareSetupAndInventory | PDT_ProductAndServicePerformance) + .SetSchemaVersion(1) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddString("inputSource", input_source) @@ -769,6 +778,7 @@ void PosixTelemetry::LogRuntimeError( auto event = EventBuilder("RuntimeError", EventPriority::HIGH, PDT_ProductAndServicePerformance) + .SetSchemaVersion(1) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddInt32("errorCode", static_cast(status.Code())) @@ -814,6 +824,7 @@ void PosixTelemetry::LogRuntimePerf( auto event = EventBuilder("RuntimePerf", EventPriority::NORMAL, PDT_ProductAndServicePerformance) + .SetSchemaVersion(1) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddUInt32("totalRunsSinceLast", total_runs_since_last) @@ -886,6 +897,7 @@ void PosixTelemetry::LogModelLoadStart(uint32_t session_id) const { auto event = EventBuilder("ModelLoadStart", EventPriority::NORMAL, PDT_ProductAndServiceUsage) + .SetSchemaVersion(1) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .Build(); @@ -947,6 +959,7 @@ void PosixTelemetry::LogEpDeviceUsage( auto event = EventBuilder("EpDeviceUsage", EventPriority::NORMAL, PDT_ProductAndServiceUsage) + .SetSchemaVersion(1) .AddCommonContext(this) .AddUInt32("sessionId", session_id) .AddString("executionProviderType", ep_type) @@ -971,6 +984,7 @@ void PosixTelemetry::LogRegisterEpLibraryStart(const std::string& registration_n auto event = EventBuilder("RegisterEpLibraryStart", EventPriority::NORMAL, PDT_ProductAndServiceUsage) + .SetSchemaVersion(1) .AddCommonContext(this) .AddString("registrationName", registration_name) .Build(); From c9b892947163f737223b4d9648e48cbea4888bac Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 19:20:07 -0500 Subject: [PATCH 39/61] telemetry: fail fast on WebAssembly + telemetry (round 22) Telemetry isn't supported on Emscripten (the 1DS vcpkg feature excludes it), but nothing prevented enabling it for a WASM build, causing a confusing late find_package/missing-target failure. Fail fast: - build.py: raise BuildError when --use_telemetry is combined with --build_wasm. - onnxruntime_external_deps.cmake: FATAL_ERROR when onnxruntime_USE_TELEMETRY is set for an Emscripten (CMAKE_SYSTEM_NAME=Emscripten) configuration, before find_package(MSTelemetry). Verified build.py is ruff format + lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/external/onnxruntime_external_deps.cmake | 4 ++++ tools/ci_build/build.py | 7 ++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index 45fb6d375f5ca..a6578058fee04 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -903,6 +903,10 @@ endif() # 1DS SDK (cpp_client_telemetry) for cross-platform telemetry on non-Windows platforms if(onnxruntime_USE_TELEMETRY AND NOT WIN32) + if(CMAKE_SYSTEM_NAME STREQUAL "Emscripten") + message(FATAL_ERROR "onnxruntime_USE_TELEMETRY is not supported for WebAssembly/Emscripten builds: " + "the 1DS telemetry SDK is excluded on Emscripten. Disable telemetry for WASM builds.") + endif() if(onnxruntime_USE_VCPKG) # Consume the 1DS SDK from the vcpkg port "cpp-client-telemetry", which exposes the # MSTelemetry::mat target with its include directories and transitive dependencies diff --git a/tools/ci_build/build.py b/tools/ci_build/build.py index 6746250d920e4..8e32c876b1539 100644 --- a/tools/ci_build/build.py +++ b/tools/ci_build/build.py @@ -356,7 +356,12 @@ def generate_build_tree( disable_float4_types = args.android or ("float4" in types_to_disable) disable_optional_type = "optional" in types_to_disable disable_sparse_tensors = "sparsetensor" in types_to_disable - # Telemetry: On Windows uses ETW, on non-Windows uses 1DS + # Telemetry: On Windows uses ETW, on non-Windows uses 1DS. Telemetry is unsupported on + # WebAssembly/Emscripten (the 1DS vcpkg feature excludes it), so fail fast on that combination. + if args.use_telemetry and args.build_wasm: + raise BuildError( + "Telemetry is not supported for WebAssembly (Emscripten) builds; omit --use_telemetry when using --build_wasm." + ) cmake_args += [ "-Donnxruntime_USE_TELEMETRY=" + ("ON" if args.use_telemetry else "OFF"), ] From 59fe3beee3c914b02d3cabe226e5c993c61ac5c6 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 19:47:58 -0500 Subject: [PATCH 40/61] telemetry: drop unused SQLite json1 to shrink the statically-linked 1DS SDK When ONNX Runtime statically links the 1DS telemetry SDK (cpp-client-telemetry), the SDK uses SQLite only for plain offline event storage and never needs the json1 extension. Request sqlite3 with default-features:false in the telemetry feature so json1 (SQLITE_OMIT_JSON, ~50 KB) is omitted from the build. vcpkg ignores a transitive default-features:false (the port's own edge), so the consumer must also opt out in its own manifest. This takes full effect once ONNX Runtime's vcpkg baseline includes the cpp-client-telemetry port revision that also opts out (microsoft/cpp_client_telemetry#1475); with the current baseline it resolves cleanly and is a no-op (json1 stays). Verified with 'vcpkg install --dry-run' that json1 is dropped when both edges opt out, and that sqlite3 is only pulled when the telemetry feature is enabled. The other static-link footprint levers are already in place in ONNX Runtime: -ffunction-sections/-fdata-sections + -fvisibility=hidden at compile, and --gc-sections / -dead_strip / /OPT:REF,ICF at link, so the SDK's PR-1475 function-sectioning and hidden visibility let the existing dead-strip remove unreferenced SDK code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/vcpkg.json | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/cmake/vcpkg.json b/cmake/vcpkg.json index d56c5e157ee4b..5529e2f70bf3c 100644 --- a/cmake/vcpkg.json +++ b/cmake/vcpkg.json @@ -102,6 +102,11 @@ { "name": "cpp-client-telemetry", "platform": "!windows & !emscripten & !uwp" + }, + { + "name": "sqlite3", + "default-features": false, + "platform": "!windows & !emscripten & !uwp" } ] } From 8419ce60dc37dd24dc1855b6f5f107c2de82fe86 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Tue, 23 Jun 2026 22:22:24 -0500 Subject: [PATCH 41/61] telemetry: consume cpp-client-telemetry 3.10.173.1 (binary-footprint build) 3.10.173.1 carries the SDK's binary-footprint / compilation improvements (microsoft/cpp_client_telemetry#1475): function-level linking + hidden visibility, which let ONNX Runtime's existing dead-strip (--gc-sections / -dead_strip / /OPT:REF,ICF) discard unreferenced SDK code, plus the sqlite3 json1 opt-out. - cmake/deps.txt: bump the FetchContent fallback to v3.10.173.1 (+ archive SHA1). - cgmanifests/cgmanifest.json: bump the tracked commit to the v3.10.173.1 tag. - cmake/vcpkg-ports/cpp-client-telemetry: add an overlay port pinned to v3.10.173.1 (mirrors the SDK's port, including sqlite3 default-features:false) so the vcpkg path resolves the new version now, before the registry merge. Removable once the vcpkg baseline includes 3.10.173.1. Combined with the sqlite3 json1 opt-out in cmake/vcpkg.json, verified via 'vcpkg install --dry-run' that the vcpkg path resolves cpp-client-telemetry@3.10.173.1 and sqlite3 with json1 dropped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cgmanifests/cgmanifest.json | 2 +- cmake/deps.txt | 2 +- .../cpp-client-telemetry/portfile.cmake | 50 +++++++++++++++++++ .../cpp-client-telemetry/vcpkg.json | 32 ++++++++++++ 4 files changed, 84 insertions(+), 2 deletions(-) create mode 100644 cmake/vcpkg-ports/cpp-client-telemetry/portfile.cmake create mode 100644 cmake/vcpkg-ports/cpp-client-telemetry/vcpkg.json diff --git a/cgmanifests/cgmanifest.json b/cgmanifests/cgmanifest.json index abed2921509c6..c030d5cb66447 100644 --- a/cgmanifests/cgmanifest.json +++ b/cgmanifests/cgmanifest.json @@ -350,7 +350,7 @@ "component": { "type": "git", "git": { - "commitHash": "cc03dce1a23538f6b820401e36ee339e9a5d2edd", + "commitHash": "edf33f80035575b82f1fafd5f9bd0dc0d2064e94", "repositoryUrl": "https://github.com/microsoft/cpp_client_telemetry.git" }, "comments": "1DS SDK (cpp_client_telemetry) for cross-platform telemetry on non-Windows platforms (macOS, Linux, Android, iOS)." diff --git a/cmake/deps.txt b/cmake/deps.txt index 28700816d20d3..0388c09d3fa7f 100644 --- a/cmake/deps.txt +++ b/cmake/deps.txt @@ -65,4 +65,4 @@ kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.20.0.tar. kleidiai-qmx;https://github.com/qualcomm/kleidiai/archive/2f10c9a8d32f81ffeeb6d4885a29cc35d2b0da87.zip;5e855730a2d69057a569f43dd7532db3b2d2a05c vulkan_headers;https://codeload.github.com/KhronosGroup/Vulkan-Headers/tar.gz/refs/tags/v1.4.344;57bc528ef7c4a3f7bfbb59e64a187e3734bd29d8 # cpp_client_telemetry (1DS SDK) for cross-platform telemetry on non-Windows platforms -cpp_client_telemetry;https://github.com/microsoft/cpp_client_telemetry/archive/refs/tags/v3.10.161.1.zip;0c0e767283fde29629bbc647b21bc0ac39edeb01 +cpp_client_telemetry;https://github.com/microsoft/cpp_client_telemetry/archive/refs/tags/v3.10.173.1.zip;d35a0f0595114304ed8c308e720671032f9586d1 diff --git a/cmake/vcpkg-ports/cpp-client-telemetry/portfile.cmake b/cmake/vcpkg-ports/cpp-client-telemetry/portfile.cmake new file mode 100644 index 0000000000000..eb63b9572c600 --- /dev/null +++ b/cmake/vcpkg-ports/cpp-client-telemetry/portfile.cmake @@ -0,0 +1,50 @@ +vcpkg_from_github( + OUT_SOURCE_PATH SOURCE_PATH + REPO microsoft/cpp_client_telemetry + REF v3.10.173.1 + SHA512 e55bc35274236f57757660073c4dccccab3462342c8566212f1df4bf8824295a2bb3d3d79a11f3950e7c9252641827e9dd3d7c28c421dea3bdaee277e4f2ce32 + HEAD_REF main +) + +# Determine if Apple HTTP should be used (no curl needed). +# Note: BUILD_APPLE_HTTP must remain ON for macOS/iOS because the vcpkg.json +# curl dependency is excluded on these platforms. +set(MATSDK_BUILD_APPLE_HTTP OFF) +if(VCPKG_TARGET_IS_OSX OR VCPKG_TARGET_IS_IOS) + set(MATSDK_BUILD_APPLE_HTTP ON) +endif() + +# iOS build options +set(MATSDK_BUILD_IOS OFF) +if(VCPKG_TARGET_IS_IOS) + set(MATSDK_BUILD_IOS ON) +endif() + +vcpkg_cmake_configure( + SOURCE_PATH "${SOURCE_PATH}" + OPTIONS + -DMATSDK_USE_VCPKG_DEPS=ON + -DBUILD_HEADERS=ON + -DBUILD_LIBRARY=ON + -DBUILD_TEST_TOOL=OFF + -DBUILD_UNIT_TESTS=OFF + -DBUILD_FUNC_TESTS=OFF + -DBUILD_JNI_WRAPPER=OFF + -DBUILD_OBJC_WRAPPER=OFF + -DBUILD_SWIFT_WRAPPER=OFF + -DBUILD_PACKAGE=OFF + -DBUILD_VERSION=${VERSION} + -DBUILD_APPLE_HTTP=${MATSDK_BUILD_APPLE_HTTP} + -DBUILD_IOS=${MATSDK_BUILD_IOS} +) + +vcpkg_cmake_install() + +vcpkg_cmake_config_fixup(PACKAGE_NAME MSTelemetry CONFIG_PATH lib/cmake/MSTelemetry) + +# Remove duplicate headers and empty dirs +file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/include") +file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/share") + +# Install license +vcpkg_install_copyright(FILE_LIST "${SOURCE_PATH}/LICENSE") diff --git a/cmake/vcpkg-ports/cpp-client-telemetry/vcpkg.json b/cmake/vcpkg-ports/cpp-client-telemetry/vcpkg.json new file mode 100644 index 0000000000000..07b02fe61a7f4 --- /dev/null +++ b/cmake/vcpkg-ports/cpp-client-telemetry/vcpkg.json @@ -0,0 +1,32 @@ +{ + "name": "cpp-client-telemetry", + "version": "3.10.173.1", + "description": "Microsoft 1DS C/C++ Client Telemetry Library", + "homepage": "https://github.com/microsoft/cpp_client_telemetry", + "license": "Apache-2.0", + "supports": "((windows & !mingw) | linux | osx | ios | android) & !uwp", + "dependencies": [ + { + "name": "curl", + "default-features": false, + "features": [ + "openssl" + ], + "platform": "linux | android" + }, + "nlohmann-json", + { + "name": "sqlite3", + "default-features": false + }, + { + "name": "vcpkg-cmake", + "host": true + }, + { + "name": "vcpkg-cmake-config", + "host": true + }, + "zlib" + ] +} From 27f87fba42612cf6dfb39ec8aaf1b0906f1585e6 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 24 Jun 2026 11:09:59 -0500 Subject: [PATCH 42/61] telemetry: switch cpp-client-telemetry to the registry 3.10.173.1 port The cpp-client-telemetry 3.10.173.1 port is now in microsoft/vcpkg (PR #52568), so the overlay-port bridge added earlier is no longer needed. Bump the vcpkg baseline to 18a4723aeb (the commit that ships the 3.10.173.1 port) and drop cmake/vcpkg-ports/cpp-client-telemetry. Verified with 'vcpkg install --dry-run' that the manifest now resolves cpp-client-telemetry@3.10.173.1 from the registry and sqlite3 with json1 dropped (the registry port carries default-features:false, combined with ORT's own opt-out). ORT's heavy deps (abseil/onnx/protobuf/flatbuffers) remain pinned via overlay-ports/overrides; only benign minor versions float with the baseline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../cpp-client-telemetry/portfile.cmake | 50 ------------------- .../cpp-client-telemetry/vcpkg.json | 32 ------------ 2 files changed, 82 deletions(-) delete mode 100644 cmake/vcpkg-ports/cpp-client-telemetry/portfile.cmake delete mode 100644 cmake/vcpkg-ports/cpp-client-telemetry/vcpkg.json diff --git a/cmake/vcpkg-ports/cpp-client-telemetry/portfile.cmake b/cmake/vcpkg-ports/cpp-client-telemetry/portfile.cmake deleted file mode 100644 index eb63b9572c600..0000000000000 --- a/cmake/vcpkg-ports/cpp-client-telemetry/portfile.cmake +++ /dev/null @@ -1,50 +0,0 @@ -vcpkg_from_github( - OUT_SOURCE_PATH SOURCE_PATH - REPO microsoft/cpp_client_telemetry - REF v3.10.173.1 - SHA512 e55bc35274236f57757660073c4dccccab3462342c8566212f1df4bf8824295a2bb3d3d79a11f3950e7c9252641827e9dd3d7c28c421dea3bdaee277e4f2ce32 - HEAD_REF main -) - -# Determine if Apple HTTP should be used (no curl needed). -# Note: BUILD_APPLE_HTTP must remain ON for macOS/iOS because the vcpkg.json -# curl dependency is excluded on these platforms. -set(MATSDK_BUILD_APPLE_HTTP OFF) -if(VCPKG_TARGET_IS_OSX OR VCPKG_TARGET_IS_IOS) - set(MATSDK_BUILD_APPLE_HTTP ON) -endif() - -# iOS build options -set(MATSDK_BUILD_IOS OFF) -if(VCPKG_TARGET_IS_IOS) - set(MATSDK_BUILD_IOS ON) -endif() - -vcpkg_cmake_configure( - SOURCE_PATH "${SOURCE_PATH}" - OPTIONS - -DMATSDK_USE_VCPKG_DEPS=ON - -DBUILD_HEADERS=ON - -DBUILD_LIBRARY=ON - -DBUILD_TEST_TOOL=OFF - -DBUILD_UNIT_TESTS=OFF - -DBUILD_FUNC_TESTS=OFF - -DBUILD_JNI_WRAPPER=OFF - -DBUILD_OBJC_WRAPPER=OFF - -DBUILD_SWIFT_WRAPPER=OFF - -DBUILD_PACKAGE=OFF - -DBUILD_VERSION=${VERSION} - -DBUILD_APPLE_HTTP=${MATSDK_BUILD_APPLE_HTTP} - -DBUILD_IOS=${MATSDK_BUILD_IOS} -) - -vcpkg_cmake_install() - -vcpkg_cmake_config_fixup(PACKAGE_NAME MSTelemetry CONFIG_PATH lib/cmake/MSTelemetry) - -# Remove duplicate headers and empty dirs -file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/include") -file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/share") - -# Install license -vcpkg_install_copyright(FILE_LIST "${SOURCE_PATH}/LICENSE") diff --git a/cmake/vcpkg-ports/cpp-client-telemetry/vcpkg.json b/cmake/vcpkg-ports/cpp-client-telemetry/vcpkg.json deleted file mode 100644 index 07b02fe61a7f4..0000000000000 --- a/cmake/vcpkg-ports/cpp-client-telemetry/vcpkg.json +++ /dev/null @@ -1,32 +0,0 @@ -{ - "name": "cpp-client-telemetry", - "version": "3.10.173.1", - "description": "Microsoft 1DS C/C++ Client Telemetry Library", - "homepage": "https://github.com/microsoft/cpp_client_telemetry", - "license": "Apache-2.0", - "supports": "((windows & !mingw) | linux | osx | ios | android) & !uwp", - "dependencies": [ - { - "name": "curl", - "default-features": false, - "features": [ - "openssl" - ], - "platform": "linux | android" - }, - "nlohmann-json", - { - "name": "sqlite3", - "default-features": false - }, - { - "name": "vcpkg-cmake", - "host": true - }, - { - "name": "vcpkg-cmake-config", - "host": true - }, - "zlib" - ] -} From 1cb7562dec2d7f39aab6ca441392b1f8c11311bb Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 24 Jun 2026 11:10:59 -0500 Subject: [PATCH 43/61] telemetry: bump vcpkg baseline to the 3.10.173.1 port commit Complete the registry transition: bump the default-registry baseline to 18a4723aeb (microsoft/vcpkg PR #52568, which ships cpp-client-telemetry 3.10.173.1) so the manifest resolves the new port from the registry now that the overlay bridge is removed. Verified via 'vcpkg install --dry-run': resolves cpp-client-telemetry@3.10.173.1 from the registry, sqlite3 with json1 dropped, and ORT's pinned heavy deps unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/vcpkg-configuration.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cmake/vcpkg-configuration.json b/cmake/vcpkg-configuration.json index 9c483fe9c2c5a..8fa750286f705 100644 --- a/cmake/vcpkg-configuration.json +++ b/cmake/vcpkg-configuration.json @@ -1,9 +1,9 @@ { - "$comment": "Baseline bumped to pick up the cpp-client-telemetry port (POSIX 1DS telemetry), which did not exist in the prior baseline. ONNX Runtime's heavy dependencies are pinned via overlay-ports/overrides (see cmake/vcpkg.json), so this bump only floats benign minor versions; impact was verified with 'vcpkg install --dry-run'.", + "$comment": "Baseline pinned to the microsoft/vcpkg commit that ships the cpp-client-telemetry 3.10.173.1 port (PR #52568) for POSIX 1DS telemetry. ONNX Runtime's heavy dependencies are pinned via overlay-ports/overrides (see cmake/vcpkg.json), so this only floats benign minor versions; verified with 'vcpkg install --dry-run'.", "default-registry": { "kind": "git", "repository": "https://github.com/Microsoft/vcpkg", - "baseline": "22b8d099947ea6ee2fcb1aa1124b21f48f84232d" + "baseline": "18a4723aeb7adbbae84bcff0edf510883800f32f" }, "overlay-ports": [ "./vcpkg-ports" From b817a52ddb7f4a32bb4d5d84ac09e9bac3d18636 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 24 Jun 2026 15:54:20 -0500 Subject: [PATCH 44/61] telemetry: don't force --use_telemetry for WebAssembly builds in build.sh build.sh unconditionally appended --use_telemetry, so './build.sh --build_wasm' became 'build.py --use_telemetry --build_wasm', which the WASM fail-fast guard (added in this PR) correctly rejects - breaking the convenience-script WASM build. Only add --use_telemetry for native builds (skip it when --build_wasm is present). Telemetry stays the default for native builds, and direct 'build.py --build_wasm' was already unaffected (telemetry is off by default). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- build.sh | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/build.sh b/build.sh index 2778cb5c53ef3..37873d0b76b7e 100755 --- a/build.sh +++ b/build.sh @@ -18,4 +18,12 @@ elif [[ "$*" == *"--android"* ]]; then DIR_OS="Android" fi -python3 $DIR/tools/ci_build/build.py --build_dir $DIR/build/$DIR_OS --use_telemetry "$@" +# Telemetry uses the 1DS SDK, which is not supported for WebAssembly/Emscripten builds. +# Only request it for native builds so that `./build.sh --build_wasm` keeps working without +# the user having to override the wrapper's default. +TELEMETRY_ARG="--use_telemetry" +if [[ "$*" == *"--build_wasm"* ]]; then + TELEMETRY_ARG="" +fi + +python3 $DIR/tools/ci_build/build.py --build_dir $DIR/build/$DIR_OS $TELEMETRY_ARG "$@" From d97f3092959ef60ace84171402a9a24da39db40b Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 24 Jun 2026 16:57:44 -0500 Subject: [PATCH 45/61] telemetry: source device-id UUID bytes from random_device DeviceId::GenerateUUID() seeded a std::mt19937 from a single std::random_device value, which caps the generated UUID's entropy at 32 bits and risks device-id collisions across a large fleet (birthday-bound ~77k devices), under-counting distinct devices in MAD/DAD telemetry. Draw each UUID field straight from std::random_device (CSPRNG-backed via getrandom // /dev/urandom on the POSIX platforms this file targets), mirroring the SDK's own PAL UUID hardening. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/device_id.cc | 25 +++++++++++--------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/onnxruntime/core/platform/posix/device_id.cc b/onnxruntime/core/platform/posix/device_id.cc index 3de994899949e..af4bb9513e140 100644 --- a/onnxruntime/core/platform/posix/device_id.cc +++ b/onnxruntime/core/platform/posix/device_id.cc @@ -58,18 +58,21 @@ std::string DeviceId::GetStatusString() { } std::string DeviceId::GenerateUUID() { + // Draw the UUID fields directly from std::random_device -- a non-deterministic, + // CSPRNG-backed source on the POSIX platforms this file targets (glibc/bionic/ + // libc++ draw from getrandom or /dev/urandom). Seeding a std::mt19937 from a + // single random_device value would cap the entropy at 32 bits and make device + // ids collide across a large fleet (birthday-bound ~77k devices), so each field + // is sourced straight from the device. random_device::operator() spans the full + // unsigned int (>= 32-bit) range. std::random_device rd; - std::mt19937 gen(rd()); - // Default-constructed distribution covers the full [0, uint32_t max] range without relying on a - // max-value macro being transitively included. - std::uniform_int_distribution dist; - - uint32_t data1 = dist(gen); - uint16_t data2 = static_cast(dist(gen) & 0xFFFF); - uint16_t data3 = static_cast((dist(gen) & 0x0FFF) | 0x4000); // Version 4 - uint16_t data4 = static_cast((dist(gen) & 0x3FFF) | 0x8000); // Variant 1 - uint16_t data5a = static_cast(dist(gen) & 0xFFFF); - uint32_t data5b = dist(gen); + + uint32_t data1 = rd(); + uint16_t data2 = static_cast(rd() & 0xFFFF); + uint16_t data3 = static_cast((rd() & 0x0FFF) | 0x4000); // Version 4 + uint16_t data4 = static_cast((rd() & 0x3FFF) | 0x8000); // Variant 1 + uint16_t data5a = static_cast(rd() & 0xFFFF); + uint32_t data5b = rd(); std::ostringstream oss; oss << std::hex << std::setfill('0') From 7e0957023a198d5418ff636ef71a0a8049066cc3 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 24 Jun 2026 16:58:12 -0500 Subject: [PATCH 46/61] telemetry: guard internal ContextFieldsProvider.hpp include to mobile (vcpkg build fix) telemetry.cc unconditionally included , an internal 1DS SDK header that the vcpkg-installed MSTelemetry::mat target does not expose (only public headers are installed). This broke the vcpkg telemetry build on desktop with "fatal error: api/ContextFieldsProvider.hpp: No such file or directory". ContextFieldsProvider is only used on mobile (Android/iOS) to read the SDK's auto-generated device id, so guard the include with the same #if as its use, and move above it so TARGET_OS_IOS is defined for the guard. Verified by a full Linux --use_vcpkg --use_telemetry build: onnxruntime_common (telemetry.cc) now compiles and libonnxruntime.so links (33 MB stripped). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index cc4f695c7d6af..bf34b7c6213f3 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -4,17 +4,24 @@ #include "core/platform/posix/telemetry.h" #include "core/platform/posix/device_id.h" +#ifdef __APPLE__ +#include +#endif + // 1DS SDK #include #include +// ContextFieldsProvider is an internal SDK header (not part of the vcpkg-installed public headers); +// it is only used on mobile to read the SDK's auto-generated device id. +#if defined(__ANDROID__) || (defined(__APPLE__) && TARGET_OS_IOS) #include +#endif #include #include #ifdef __APPLE__ #include -#include #endif #if defined(__linux__) || defined(__ANDROID__) From 3d0d405ec3fb5e624149442f8e0da62fb546dd33 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Wed, 24 Jun 2026 19:04:08 -0500 Subject: [PATCH 47/61] Add opt-in shared 1DS telemetry SDK (libmat.so) for multi-binary sharing By default the non-Windows 1DS SDK (cpp-client-telemetry) is statically linked into libonnxruntime, adding ~9 MiB that is mostly its OpenSSL/curl/sqlite3 TLS+HTTP stack. When onnxruntime ships alongside another binary that links the same SDK (e.g. onnxruntime-genai), each binary embeds its own copy, so the footprint is paid per-binary and two independent 1DS LogManager singletons run in one process. Add --telemetry_shared_sdk (default off) to build the SDK as a single self-contained shared library (libmat.so) that several binaries link dynamically and share. The SDK's own dependencies stay static inside libmat.so, so it needs no host libraries. Default static behavior is unchanged (a standalone onnxruntime is smaller fully static), and the one-time vcpkg rebuild the changed triplet causes only affects the opt-in build. Verified on Linux (--use_vcpkg --use_telemetry --telemetry_shared_sdk): libmat.so = 12.1 MiB stripped, NEEDED only system libs; libonnxruntime.so drops from 33.31 to 24.26 MiB stripped (NEEDED libmat.so, 0 MAT symbols embedded), resolved via the existing $ORIGIN RPATH. - cmake/CMakeLists.txt: add onnxruntime_TELEMETRY_SHARED_SDK dependent option - tools/python/util/vcpkg_helpers.py: emit per-port dynamic VCPKG_LIBRARY_LINKAGE for cpp-client-telemetry (deps stay static) in the POSIX triplets - tools/ci_build/build_args.py, build.py: --telemetry_shared_sdk flag + validation, -Donnxruntime_TELEMETRY_SHARED_SDK, thread into Linux/macOS triplet generation - cmake/onnxruntime_common.cmake: install libmat.so next to libonnxruntime via IMPORTED_RUNTIME_ARTIFACTS; FATAL_ERROR if requested with the FetchContent fallback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/CMakeLists.txt | 8 ++++++++ cmake/onnxruntime_common.cmake | 11 +++++++++++ tools/ci_build/build.py | 16 ++++++++++++++-- tools/ci_build/build_args.py | 13 +++++++++++++ tools/python/util/vcpkg_helpers.py | 24 ++++++++++++++++++++++-- 5 files changed, 68 insertions(+), 4 deletions(-) diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt index ea92be98c5468..0ba9b0fa3728d 100644 --- a/cmake/CMakeLists.txt +++ b/cmake/CMakeLists.txt @@ -147,6 +147,14 @@ option(onnxruntime_USE_TELEMETRY "Build with Telemetry" OFF) # throwaway token from core/platform/posix/telemetry.cc; official pipelines override it at compile # time by setting this to the real token (injected from a pipeline secret). set(onnxruntime_1DS_TENANT_TOKEN "" CACHE STRING "Override the compiled-in 1DS telemetry ingestion token (non-Windows)") +# When building non-Windows telemetry, optionally build the 1DS SDK (cpp-client-telemetry) as a shared +# library (libmat.so) instead of statically linking it into libonnxruntime. This lets several binaries +# that link the same SDK (for example onnxruntime and onnxruntime-genai shipped together) share a single +# copy of the SDK and its transitive TLS/HTTP stack, paying its footprint once instead of per-binary, and +# avoids running two independent 1DS LogManager singletons in one process. The SDK's own dependencies +# (OpenSSL/curl/sqlite3/zlib) stay static inside libmat.so so it remains self-contained. Requires the +# vcpkg cpp-client-telemetry port. Off by default: a standalone onnxruntime is smaller/simpler fully static. +cmake_dependent_option(onnxruntime_TELEMETRY_SHARED_SDK "Build the non-Windows 1DS telemetry SDK as a shared library so multiple binaries can share one copy" OFF "onnxruntime_USE_TELEMETRY;NOT WIN32" OFF) cmake_dependent_option(onnxruntime_USE_MIMALLOC "Override new/delete and arena allocator with mimalloc" OFF "WIN32;NOT onnxruntime_USE_CUDA;NOT onnxruntime_USE_OPENVINO" OFF) option(onnxruntime_USE_CANN "Build with CANN support" OFF) option(onnxruntime_USE_XNNPACK "Build with XNNPACK support. Provides an alternative math library on ARM, WebAssembly and x86." OFF) diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index 093f105abbb4d..883210f2bc450 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -244,7 +244,18 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) # directories and transitive dependencies (curl/sqlite3/zlib/nlohmann-json), so no # manual include paths or system libraries are required here. target_link_libraries(onnxruntime_common PRIVATE MSTelemetry::mat) + if(onnxruntime_TELEMETRY_SHARED_SDK AND onnxruntime_BUILD_SHARED_LIB) + # The vcpkg triplet built the SDK as a shared library (libmat.so) so it can be shared by + # several binaries (e.g. onnxruntime and onnxruntime-genai) instead of being statically + # embedded in each. Ship it next to libonnxruntime; the $ORIGIN/@loader_path RPATH set on the + # onnxruntime target (see onnxruntime.cmake) resolves it at load time. IMPORTED_RUNTIME_ARTIFACTS + # installs the resolved soname and its symlinks. + install(IMPORTED_RUNTIME_ARTIFACTS MSTelemetry::mat LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}) + endif() elseif(TARGET mat) + if(onnxruntime_TELEMETRY_SHARED_SDK) + message(FATAL_ERROR "onnxruntime_TELEMETRY_SHARED_SDK requires the vcpkg cpp-client-telemetry port (build with --use_vcpkg); it is not supported with the FetchContent fallback.") + endif() target_link_libraries(onnxruntime_common PRIVATE mat) # cpp_client_telemetry uses include_directories() (directory-scoped) rather than # target_include_directories(), so include paths don't propagate via target_link_libraries. diff --git a/tools/ci_build/build.py b/tools/ci_build/build.py index 8e32c876b1539..f9480b395a73d 100644 --- a/tools/ci_build/build.py +++ b/tools/ci_build/build.py @@ -362,8 +362,20 @@ def generate_build_tree( raise BuildError( "Telemetry is not supported for WebAssembly (Emscripten) builds; omit --use_telemetry when using --build_wasm." ) + # The shared 1DS SDK (libmat.so) is provided by the vcpkg cpp-client-telemetry port and is only + # meaningful for the non-Windows 1DS path. Fail fast on unsupported combinations. + if args.telemetry_shared_sdk: + if not args.use_telemetry: + raise BuildError("--telemetry_shared_sdk requires --use_telemetry.") + if not args.use_vcpkg: + raise BuildError( + "--telemetry_shared_sdk requires --use_vcpkg; the shared 1DS SDK is built by the vcpkg cpp-client-telemetry port." + ) + if args.android: + raise BuildError("--telemetry_shared_sdk is not supported for Android builds.") cmake_args += [ "-Donnxruntime_USE_TELEMETRY=" + ("ON" if args.use_telemetry else "OFF"), + "-Donnxruntime_TELEMETRY_SHARED_SDK=" + ("ON" if args.telemetry_shared_sdk else "OFF"), ] disable_string_type = "string" in types_to_disable @@ -629,10 +641,10 @@ def generate_build_tree( osx_target = os.environ.get("MACOSX_DEPLOYMENT_TARGET") if osx_target is not None: log.info(f"Setting VCPKG_OSX_DEPLOYMENT_TARGET to {osx_target}") - generate_macos_triplets(build_dir, configs, osx_target, args.use_full_protobuf) + generate_macos_triplets(build_dir, configs, osx_target, args.use_full_protobuf, args.telemetry_shared_sdk) else: # Linux, *BSD, AIX or other platforms - generate_linux_triplets(build_dir, configs, args.use_full_protobuf) + generate_linux_triplets(build_dir, configs, args.use_full_protobuf, args.telemetry_shared_sdk) add_default_definition(cmake_extra_defines, "CMAKE_TOOLCHAIN_FILE", str(vcpkg_toolchain_path)) # Choose the cmake triplet diff --git a/tools/ci_build/build_args.py b/tools/ci_build/build_args.py index a42273138662d..0230174656700 100644 --- a/tools/ci_build/build_args.py +++ b/tools/ci_build/build_args.py @@ -874,6 +874,19 @@ def add_other_feature_args(parser: argparse.ArgumentParser) -> None: action="store_true", help="Enable telemetry (ETW on Windows; 1DS on Linux, macOS, Android, and iOS).", ) + parser.add_argument( + "--telemetry_shared_sdk", + action="store_true", + help=( + "Build the non-Windows 1DS telemetry SDK (cpp-client-telemetry) as a shared library " + "(libmat.so) instead of statically linking it into libonnxruntime. This lets several " + "binaries that link the same SDK (for example onnxruntime and onnxruntime-genai shipped " + "together) share a single copy of the SDK and its TLS/HTTP stack instead of embedding it " + "in each. The SDK's own dependencies (OpenSSL/curl/sqlite3/zlib) stay static inside " + "libmat.so so it remains self-contained. Requires --use_telemetry and --use_vcpkg; " + "Linux/macOS only." + ), + ) def is_cross_compiling(args: argparse.Namespace) -> bool: diff --git a/tools/python/util/vcpkg_helpers.py b/tools/python/util/vcpkg_helpers.py index 8d1c665f631d9..4b9b4a7bb8be7 100644 --- a/tools/python/util/vcpkg_helpers.py +++ b/tools/python/util/vcpkg_helpers.py @@ -349,6 +349,7 @@ def generate_triplet_for_posix_platform( target_abi: str, osx_deployment_target: str, use_full_protobuf: bool, + telemetry_shared_sdk: bool = False, ) -> None: """ Generate triplet file for POSIX platforms (Linux, macOS). @@ -365,6 +366,7 @@ def generate_triplet_for_posix_platform( target_abi (str): The target ABI, which maps to the VCPKG_TARGET_ARCHITECTURE variable. Valid options include x86, x64, arm, arm64, arm64ec, s390x, ppc64le, riscv32, riscv64, loongarch32, loongarch64, mips64. osx_deployment_target (str, optional): The macOS deployment target version. The parameter sets the minimum macOS version for compiled binaries. It also changes what versions of the macOS platform SDK CMake will search for. See the CMake documentation for CMAKE_OSX_DEPLOYMENT_TARGET for more information. use_full_protobuf (bool): Flag indicating if full Protobuf is used. + telemetry_shared_sdk (bool): Flag indicating if the 1DS telemetry SDK (cpp-client-telemetry) should be built as a shared library (libmat.so) while its dependencies remain static. """ folder_name_parts = [] if enable_asan: @@ -409,6 +411,14 @@ def generate_triplet_for_posix_platform( # Valid options are dynamic and static. Libraries can ignore this setting if they do not support the preferred linkage type. f.write("set(VCPKG_LIBRARY_LINKAGE static)\n") + if telemetry_shared_sdk: + # Build only the 1DS telemetry SDK (cpp-client-telemetry) as a shared library while + # keeping every other port (including the SDK's own OpenSSL/curl/sqlite3/zlib + # dependencies) static. This produces a self-contained libmat.so that several binaries + # (for example onnxruntime and onnxruntime-genai shipped together) can share instead of + # statically embedding the SDK and its TLS/HTTP stack into each one. + f.write('if(PORT STREQUAL "cpp-client-telemetry")\n set(VCPKG_LIBRARY_LINKAGE dynamic)\nendif()\n') + ldflags = [] if enable_binskim and os_name == "linux": @@ -764,7 +774,9 @@ def generate_windows_triplets(build_dir: str, configs: set[str], toolset_version ) # Pass enable_minimal_build -def generate_linux_triplets(build_dir: str, configs: set[str], use_full_protobuf: bool) -> None: +def generate_linux_triplets( + build_dir: str, configs: set[str], use_full_protobuf: bool, telemetry_shared_sdk: bool = False +) -> None: """ Generate triplet files for Linux platforms. @@ -772,6 +784,7 @@ def generate_linux_triplets(build_dir: str, configs: set[str], use_full_protobuf build_dir (str): The directory to save the generated triplet files. configs (set[str]): The set of build configurations. use_full_protobuf (bool): Flag indicating if full Protobuf is used. + telemetry_shared_sdk (bool): Flag indicating if the 1DS telemetry SDK should be built as a shared library. """ target_abis = ["x86", "x64", "arm", "arm64", "s390x", "ppc64le", "riscv64", "loongarch64", "mips64"] for enable_rtti in [True, False]: @@ -797,11 +810,16 @@ def generate_linux_triplets(build_dir: str, configs: set[str], use_full_protobuf target_abi, None, use_full_protobuf=use_full_protobuf, + telemetry_shared_sdk=telemetry_shared_sdk, ) def generate_macos_triplets( - build_dir: str, configs: set[str], osx_deployment_target: str, use_full_protobuf: bool + build_dir: str, + configs: set[str], + osx_deployment_target: str, + use_full_protobuf: bool, + telemetry_shared_sdk: bool = False, ) -> None: """ Generate triplet files for macOS platforms. @@ -810,6 +828,7 @@ def generate_macos_triplets( build_dir (str): The directory to save the generated triplet files. osx_deployment_target (str, optional): The macOS deployment target version. use_full_protobuf (bool): Flag indicating if full Protobuf is used. + telemetry_shared_sdk (bool): Flag indicating if the 1DS telemetry SDK should be built as a shared library. """ target_abis = ["x64", "arm64", "universal2"] for enable_rtti in [True, False]: @@ -836,4 +855,5 @@ def generate_macos_triplets( target_abi, osx_deployment_target, use_full_protobuf=use_full_protobuf, + telemetry_shared_sdk=telemetry_shared_sdk, ) From 422e263dc8a9e3c6b825a3621954e08e8909f242 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 25 Jun 2026 00:08:57 -0500 Subject: [PATCH 48/61] Fix Apple telemetry dependency merge resolution Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../external/onnxruntime_external_deps.cmake | 73 +++++++++++++------ 1 file changed, 50 insertions(+), 23 deletions(-) diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index 7654d896d341e..e60070e7da560 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -740,26 +740,12 @@ if (onnxruntime_USE_WEBGPU) # ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/dawn_destroy_buffer_on_destructor.patch && - # The dawn_force_enable_f16_nvidia_vulkan.patch contains the following changes: - # - # - (private) Force enable f16 support for NVIDIA Vulkan - # Dawn disabled f16 support for NVIDIA Vulkan by default because of crashes in f16 CTS tests (crbug.com/tint/2164). - # Since the crashes are limited to specific GPU models, we patched Dawn to remove the restriction. - # - ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/dawn_force_enable_f16_nvidia_vulkan.patch && - # The dawn_binskim.patch contains the following changes: # # - (private) Fulfill the BinSkim requirements # Some build warnings are not allowed to be disabled in project level. ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/dawn_binskim.patch && - # The uniform_and_storage_buffer_16_bit_access.patch contains the following changes: - # - # - (private) Android devices don't seem to allow fp16 in uniforms so the WebGPU EP has to manually handle passing an fp32 - # in the uniform and converting to fp16 before using. - ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/uniform_and_storage_buffer_16_bit_access.patch && - # The safari_polyfill.patch contains the following changes: # # - (private) Fix compatibility issues with Safari. Contains the following changes: @@ -767,6 +753,38 @@ if (onnxruntime_USE_WEBGPU) # ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/safari_polyfill.patch && + # The dawn_device_lost_keepalive.patch contains the following changes: + # + # - (private) Fix premature ABORT when device.lost fires in callUserCallback + # The device.lost handler was wrapped in callUserCallback without runtimeKeepalivePush/Pop, + # causing maybeExit() to trigger _exit(0) and set ABORT=true when runtimeKeepaliveCounter + # was 0. This silently dropped all subsequent WebGPU callbacks (e.g. requestAdapter), + # breaking session re-creation after device destruction. + # + ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/dawn_device_lost_keepalive.patch && + + # The dawn_dxc_output_dir.patch contains the following changes: + # + # - (private) Fix DXC output directory for RelWithDebInfo and MinSizeRel configs + # Dawn only overrides the DXC output directory for Debug and Release configs. This causes + # build failures when using multi-config generators (like Visual Studio) with RelWithDebInfo + # because dxcompiler.dll ends up in the default output path instead of CMAKE_BINARY_DIR/$, + # and the copy_dxil_dll target copies dxil.dll to a different location. + # + ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/dawn_dxc_output_dir.patch && + + # The dawn_buffer_fix_injection.patch contains the following changes: + # + # - (private) Fix importJsBuffer calling wrong WGPUBufferImpl constructor + # Without this patch, importJsBuffer calls emwgpuCreateBuffer which invokes the + # (source, mappedAtCreation=false) constructor instead of the injection constructor + # tagged with kImportedFromJS. This patch adjusts the injection constructor signature + # to disambiguate it from the (source, mappedAtCreation) overload so emwgpuCreateBuffer + # reliably selects the injection constructor and imported buffers are properly tagged + # as kImportedFromJS. + # + ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/dawn_buffer_fix_injection.patch && + # Remove the test folder to speed up potential file scan operations (70k+ files not needed for build). # Using token ensures the correct absolute path regardless of working directory. ${CMAKE_COMMAND} -E rm -rf /test) @@ -862,12 +880,15 @@ set(onnxruntime_LINK_DIRS) if (onnxruntime_USE_CUDA) find_package(CUDAToolkit REQUIRED) - if(onnxruntime_CUDNN_HOME) - file(TO_CMAKE_PATH ${onnxruntime_CUDNN_HOME} onnxruntime_CUDNN_HOME) - set(CUDNN_PATH ${onnxruntime_CUDNN_HOME}) - endif() + # cuDNN is not needed for minimal CUDA builds (e.g., TensorRT-only builds) + if(NOT onnxruntime_CUDA_MINIMAL) + if(onnxruntime_CUDNN_HOME) + file(TO_CMAKE_PATH ${onnxruntime_CUDNN_HOME} onnxruntime_CUDNN_HOME) + set(CUDNN_PATH ${onnxruntime_CUDNN_HOME}) + endif() - include(cuDNN) + include(cuDNN) + endif() endif() if(onnxruntime_USE_SNPE) @@ -943,10 +964,16 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) # points to ORT's root instead. Fix by adding the actual source dir as an include path. if(TARGET mat) target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}) - # Also add subdirectories for bundled headers (sqlite3.h, zlib.h) that are included without - # a path prefix in the 1DS SDK sources. - target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/sqlite) - target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/zlib) + # On iOS we ship the SDK's bundled sqlite3/zlib headers and pair them with a bundled + # zlib target below, so the vendored symbol-renaming `act_z_*` ABI is consistent. + # On macOS the system / (resolved via /usr/local/include from + # lib/CMakeLists.txt) is the right header to pair with the system `z` / `sqlite3` + # targets that the SDK imports; adding the vendored headers there would produce + # an `act_z_*` compile/link mismatch against system libz. + if(CMAKE_SYSTEM_NAME STREQUAL "iOS") + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/sqlite) + target_include_directories(mat PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/zlib) + endif() # ORT enables -ffast-math globally, which conflicts with # std::numeric_limits::infinity() in the 1DS SDK's bundled nlohmann/json.hpp. # Also suppress warnings in the 1DS SDK code that ORT treats as errors. From 260aaa94b58772a728e612db2a5a4bcd455c266e Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 25 Jun 2026 00:29:01 -0500 Subject: [PATCH 49/61] Fix ONNX pointer and iOS telemetry fallback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/external/onnx | 2 +- cmake/external/onnxruntime_external_deps.cmake | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/cmake/external/onnx b/cmake/external/onnx index d3f6b795aedb4..2bb50465112fe 160000 --- a/cmake/external/onnx +++ b/cmake/external/onnx @@ -1 +1 @@ -Subproject commit d3f6b795aedb48eaecc881bf5e8f5dd6efbe25b3 +Subproject commit 2bb50465112feca9003e1ed654d77f01ff1415ca diff --git a/cmake/external/onnxruntime_external_deps.cmake b/cmake/external/onnxruntime_external_deps.cmake index 15988236c0a2f..01c60dcd3dfaf 100644 --- a/cmake/external/onnxruntime_external_deps.cmake +++ b/cmake/external/onnxruntime_external_deps.cmake @@ -951,7 +951,8 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) endif() endif() if(NOT DEFINED IOS_PLAT) - if(CMAKE_OSX_SYSROOT MATCHES "iPhoneSimulator") + string(TOLOWER "${CMAKE_OSX_SYSROOT}" IOS_SYSROOT_LOWER) + if(IOS_SYSROOT_LOWER MATCHES "iphonesimulator") set(IOS_PLAT "iphonesimulator" CACHE STRING "iOS platform for 1DS SDK" FORCE) else() set(IOS_PLAT "iphoneos" CACHE STRING "iOS platform for 1DS SDK" FORCE) @@ -1019,7 +1020,6 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) "${cpp_client_telemetry_SOURCE_DIR}/zlib/trees.c" "${cpp_client_telemetry_SOURCE_DIR}/zlib/uncompr.c" "${cpp_client_telemetry_SOURCE_DIR}/zlib/zutil.c" - "${cpp_client_telemetry_SOURCE_DIR}/zlib/simd_stub.c" ) target_include_directories(onnxruntime_mat_zlib_bundled PUBLIC "${cpp_client_telemetry_SOURCE_DIR}/zlib") target_compile_options(onnxruntime_mat_zlib_bundled PRIVATE From 563a7026cc6abbb29ce02205398cd158e3be8f98 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 25 Jun 2026 01:49:40 -0500 Subject: [PATCH 50/61] Build a leaner sqlite3 for 1DS telemetry's offline store sqlite3 is pulled in only as the non-Windows 1DS telemetry SDK's offline event cache, and the SDK uses it very narrowly (one event table: parameter-bound INSERT/SELECT/DELETE, a few PRAGMAs and VACUUM, UTF-8 only; no triggers, views, FTS, JSON, ALTER, ATTACH, foreign keys, UTF-16, or extension loading). Compile sqlite3 with the matching feature reductions so the offline store is smaller and hardened (no runtime extension loading), while keeping persistence intact. Wired as a per-PORT compile-flag override in the generated POSIX triplets (same mechanism as the linkage override), gated on --use_telemetry so non-telemetry builds keep their vcpkg cache. Applies to any telemetry build; only ORT's own sqlite is affected (ORT itself does not use sqlite). Verified on Linux against the vcpkg sqlite3 port: the set builds cleanly in both Debug and Release, and the SDK references none of the omitted APIs (it uses only bind_*/errmsg/temp_directory, all retained), so the trimmed sqlite links cleanly. Release .text shrinks ~33 KB. Three otherwise-useful omits are intentionally excluded because the vcpkg sqlite3 port enables conflicting options: SQLITE_OMIT_DECLTYPE (vs SQLITE_ENABLE_COLUMN_METADATA, all configs) and SQLITE_OMIT_TRACE / SQLITE_UNTESTABLE (vs SQLITE_DEBUG/SELECTTRACE in Debug). The deeper grammar omits (triggers/views/window functions) require building sqlite from canonical sources and are deliberately left for a custom sqlite port. - tools/python/util/vcpkg_helpers.py: _SQLITE_TELEMETRY_MINIMAL_DEFINES + per-port emission gated on use_telemetry; thread use_telemetry through the Linux/macOS triplet generators - tools/ci_build/build.py: pass args.use_telemetry to the triplet generators Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- tools/ci_build/build.py | 8 +++-- tools/python/util/vcpkg_helpers.py | 49 +++++++++++++++++++++++++++++- 2 files changed, 54 insertions(+), 3 deletions(-) diff --git a/tools/ci_build/build.py b/tools/ci_build/build.py index f9480b395a73d..8a8f9a649c27d 100644 --- a/tools/ci_build/build.py +++ b/tools/ci_build/build.py @@ -641,10 +641,14 @@ def generate_build_tree( osx_target = os.environ.get("MACOSX_DEPLOYMENT_TARGET") if osx_target is not None: log.info(f"Setting VCPKG_OSX_DEPLOYMENT_TARGET to {osx_target}") - generate_macos_triplets(build_dir, configs, osx_target, args.use_full_protobuf, args.telemetry_shared_sdk) + generate_macos_triplets( + build_dir, configs, osx_target, args.use_full_protobuf, args.telemetry_shared_sdk, args.use_telemetry + ) else: # Linux, *BSD, AIX or other platforms - generate_linux_triplets(build_dir, configs, args.use_full_protobuf, args.telemetry_shared_sdk) + generate_linux_triplets( + build_dir, configs, args.use_full_protobuf, args.telemetry_shared_sdk, args.use_telemetry + ) add_default_definition(cmake_extra_defines, "CMAKE_TOOLCHAIN_FILE", str(vcpkg_toolchain_path)) # Choose the cmake triplet diff --git a/tools/python/util/vcpkg_helpers.py b/tools/python/util/vcpkg_helpers.py index 4b9b4a7bb8be7..248fc1e6d78fd 100644 --- a/tools/python/util/vcpkg_helpers.py +++ b/tools/python/util/vcpkg_helpers.py @@ -1,6 +1,32 @@ import os from pathlib import Path +# Compile-time options that shrink the sqlite3 build that the 1DS telemetry SDK +# (cpp-client-telemetry) pulls in. sqlite3 is only ever built here as the SDK's offline-event +# store, and the SDK uses it very narrowly: a single event table with parameter-bound +# INSERT/SELECT/DELETE, a handful of PRAGMAs and VACUUM, UTF-8 only (no triggers, views, FTS, +# JSON, ALTER, ATTACH, foreign keys, UTF-16, or extension loading). These feature reductions are +# therefore safe for that usage, also harden it (no runtime extension loading), and all take +# effect with the prebuilt sqlite amalgamation. Deeper grammar omits (triggers/views/window +# functions) only shrink a build from canonical sources, so they are intentionally not applied here. +# Some omits are deliberately excluded because they conflict with options the vcpkg sqlite3 port +# enables: SQLITE_OMIT_DECLTYPE clashes with SQLITE_ENABLE_COLUMN_METADATA (all configs), and +# SQLITE_OMIT_TRACE / SQLITE_UNTESTABLE clash with SQLITE_DEBUG / SELECTTRACE in Debug-config builds. +# Verified: the SDK references none of the omitted APIs, so the trimmed sqlite still links cleanly. +_SQLITE_TELEMETRY_MINIMAL_DEFINES = [ + "-DSQLITE_OMIT_LOAD_EXTENSION", + "-DSQLITE_OMIT_DEPRECATED", + "-DSQLITE_OMIT_UTF16", + "-DSQLITE_OMIT_PROGRESS_CALLBACK", + "-DSQLITE_OMIT_SHARED_CACHE", + "-DSQLITE_OMIT_GET_TABLE", + "-DSQLITE_OMIT_COMPLETE", + "-DSQLITE_OMIT_TCL_VARIABLE", + "-DSQLITE_DQS=0", + "-DSQLITE_DEFAULT_MEMSTATUS=0", + "-DSQLITE_DEFAULT_FOREIGN_KEYS=0", +] + # The official vcpkg repository has about 80 different triplets. But ONNX Runtime has many more build variants. For example, in general, for each platform, we need to support builds with C++ exceptions, builds without C++ exceptions, builds with C++ RTTI, builds without C++ RTTI, linking to static C++ runtime, linking to dynamic (shared) C++ runtime, builds with address sanitizer, builds without address sanitizer, etc. Therefore, this script file was created to dynamically generate the triplet files on-the-fly. # Originally, we tried to check in all the generated files into our repository so that people could build onnxruntime without using build.py or any other Python scripts in the "/tools" directory. However, we encountered an issue when adding support for WASM builds. VCPKG has a limitation that when doing cross-compiling, the triplet file must specify the full path of the chain-loaded toolchain file. The file needs to be located either via environment variables (like ANDROID_NDK_HOME) or via an absolute path. Since environment variables are hard to track, we chose the latter approach. So the generated triplet files may contain absolute file paths that are only valid on the current build machine. @@ -350,6 +376,7 @@ def generate_triplet_for_posix_platform( osx_deployment_target: str, use_full_protobuf: bool, telemetry_shared_sdk: bool = False, + use_telemetry: bool = False, ) -> None: """ Generate triplet file for POSIX platforms (Linux, macOS). @@ -367,6 +394,7 @@ def generate_triplet_for_posix_platform( osx_deployment_target (str, optional): The macOS deployment target version. The parameter sets the minimum macOS version for compiled binaries. It also changes what versions of the macOS platform SDK CMake will search for. See the CMake documentation for CMAKE_OSX_DEPLOYMENT_TARGET for more information. use_full_protobuf (bool): Flag indicating if full Protobuf is used. telemetry_shared_sdk (bool): Flag indicating if the 1DS telemetry SDK (cpp-client-telemetry) should be built as a shared library (libmat.so) while its dependencies remain static. + use_telemetry (bool): Flag indicating if telemetry is enabled; when set, sqlite3 (the telemetry SDK's offline store) is compiled with size-reducing feature omits. """ folder_name_parts = [] if enable_asan: @@ -475,6 +503,16 @@ def generate_triplet_for_posix_platform( f.write(f'set(VCPKG_C_FLAGS_RELWITHDEBINFO "{" ".join(cflags_release)}")\n') f.write(f'set(VCPKG_CXX_FLAGS_RELWITHDEBINFO "{" ".join(cflags_release)}")\n') + if use_telemetry: + # sqlite3 is pulled in only as the 1DS telemetry SDK's offline store; compile it with + # the matching feature reductions (see _SQLITE_TELEMETRY_MINIMAL_DEFINES) to shrink the + # offline-storage footprint and drop runtime extension loading. Scoped per-PORT so it + # only affects sqlite3, and gated on telemetry so non-telemetry builds keep their cache. + sqlite_min_defines = " ".join(_SQLITE_TELEMETRY_MINIMAL_DEFINES) + f.write('if(PORT STREQUAL "sqlite3")\n') + f.write(f' string(APPEND VCPKG_C_FLAGS " {sqlite_min_defines}")\n') + f.write("endif()\n") + # Set target platform # VCPKG_CMAKE_SYSTEM_NAME specifies the target platform. if os_name == "linux": @@ -775,7 +813,11 @@ def generate_windows_triplets(build_dir: str, configs: set[str], toolset_version def generate_linux_triplets( - build_dir: str, configs: set[str], use_full_protobuf: bool, telemetry_shared_sdk: bool = False + build_dir: str, + configs: set[str], + use_full_protobuf: bool, + telemetry_shared_sdk: bool = False, + use_telemetry: bool = False, ) -> None: """ Generate triplet files for Linux platforms. @@ -785,6 +827,7 @@ def generate_linux_triplets( configs (set[str]): The set of build configurations. use_full_protobuf (bool): Flag indicating if full Protobuf is used. telemetry_shared_sdk (bool): Flag indicating if the 1DS telemetry SDK should be built as a shared library. + use_telemetry (bool): Flag indicating if telemetry is enabled (triggers a size-reduced sqlite3 build). """ target_abis = ["x86", "x64", "arm", "arm64", "s390x", "ppc64le", "riscv64", "loongarch64", "mips64"] for enable_rtti in [True, False]: @@ -811,6 +854,7 @@ def generate_linux_triplets( None, use_full_protobuf=use_full_protobuf, telemetry_shared_sdk=telemetry_shared_sdk, + use_telemetry=use_telemetry, ) @@ -820,6 +864,7 @@ def generate_macos_triplets( osx_deployment_target: str, use_full_protobuf: bool, telemetry_shared_sdk: bool = False, + use_telemetry: bool = False, ) -> None: """ Generate triplet files for macOS platforms. @@ -829,6 +874,7 @@ def generate_macos_triplets( osx_deployment_target (str, optional): The macOS deployment target version. use_full_protobuf (bool): Flag indicating if full Protobuf is used. telemetry_shared_sdk (bool): Flag indicating if the 1DS telemetry SDK should be built as a shared library. + use_telemetry (bool): Flag indicating if telemetry is enabled (triggers a size-reduced sqlite3 build). """ target_abis = ["x64", "arm64", "universal2"] for enable_rtti in [True, False]: @@ -856,4 +902,5 @@ def generate_macos_triplets( osx_deployment_target, use_full_protobuf=use_full_protobuf, telemetry_shared_sdk=telemetry_shared_sdk, + use_telemetry=use_telemetry, ) From 1c578a27132343e6caf62f7cf7857aacb1614a2a Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 25 Jun 2026 18:00:18 -0500 Subject: [PATCH 51/61] Fix FetchContent 1DS telemetry build on POSIX static and under -Werror The non-vcpkg (FetchContent) telemetry path failed the default static ./build.sh on Linux in two ways: - install(EXPORT onnxruntimeTargets) rejected onnxruntime_common because it links the FetchContent 'mat' target, which is not installed/exported. Scope the link to \$ so mat is still absorbed into the shared library and in-tree executables but is excluded from the static export (where mat cannot be shipped). The vcpkg MSTelemetry::mat path is imported and unaffected. - telemetry.cc failed to compile because the SDK public headers (NullObjects.hpp, LogManagerProvider.hpp) trip onnxruntime_common's -Werror=unused-parameter. Mark the SDK include directories SYSTEM so the third-party headers are exempt, matching the vcpkg path. Validated on Linux (Ubuntu, gcc-13) FetchContent static build: configure succeeds, the 1DS SDK (libmat.a) builds, and onnxruntime_common (telemetry.cc, device_id.cc, env.cc) compiles and archives. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/onnxruntime_common.cmake | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake index 883210f2bc450..000757761d03f 100644 --- a/cmake/onnxruntime_common.cmake +++ b/cmake/onnxruntime_common.cmake @@ -256,12 +256,21 @@ if(onnxruntime_USE_TELEMETRY AND NOT WIN32) if(onnxruntime_TELEMETRY_SHARED_SDK) message(FATAL_ERROR "onnxruntime_TELEMETRY_SHARED_SDK requires the vcpkg cpp-client-telemetry port (build with --use_vcpkg); it is not supported with the FetchContent fallback.") endif() - target_link_libraries(onnxruntime_common PRIVATE mat) + # mat is a FetchContent target that is not installed/exported. Scope the link to the build + # interface so it is still absorbed into libonnxruntime (shared build) and in-tree executables, + # but is excluded from the installed onnxruntimeTargets export of the static onnxruntime_common + # (where mat cannot be shipped). Without this, install(EXPORT) fails for static telemetry builds + # with "target onnxruntime_common requires target mat that is not in any export set". The vcpkg + # path links the imported MSTelemetry::mat target, which is exportable and unaffected. + target_link_libraries(onnxruntime_common PRIVATE $) # cpp_client_telemetry uses include_directories() (directory-scoped) rather than # target_include_directories(), so include paths don't propagate via target_link_libraries. - # Add them explicitly for onnxruntime_common. + # Add them explicitly for onnxruntime_common. Mark them SYSTEM so the SDK's public headers are + # exempt from onnxruntime_common's -Wall -Wextra -Werror (they trip -Werror=unused-parameter in + # NullObjects.hpp / LogManagerProvider.hpp). The vcpkg MSTelemetry::mat path already propagates + # its includes as SYSTEM via the imported target. if(DEFINED cpp_client_telemetry_SOURCE_DIR) - target_include_directories(onnxruntime_common PRIVATE + target_include_directories(onnxruntime_common SYSTEM PRIVATE ${cpp_client_telemetry_SOURCE_DIR}/lib/include/public ${cpp_client_telemetry_SOURCE_DIR}/lib/include/mat ${cpp_client_telemetry_SOURCE_DIR}/lib From 02f9a93583926a978244cd7cb98c78109cf3961e Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 25 Jun 2026 18:33:36 -0500 Subject: [PATCH 52/61] Scrub absolute paths from telemetry errorMessage on POSIX and Windows LogRuntimeError already reduces its file field to a basename so telemetry does not leak usernames or local paths, but the adjacent errorMessage was uploaded verbatim. ORT error messages frequently embed absolute model paths (e.g. 'Load model from /home//models/foo.onnx failed'), so the same path/username leakage reached telemetry through errorMessage. Add a shared RedactAbsolutePathsForTelemetry helper (core/platform/telemetry_redaction.h) that reduces POSIX, Windows-drive, and UNC absolute paths embedded in a free-form string to their basename, leaving relative paths, URLs, and other text unchanged. Apply it to every errorMessage field on both the POSIX (1DS) and Windows (ETW) telemetry paths. Add unit tests in test/platform/telemetry_redaction_test.cc. Validated on Linux (gcc-13): onnxruntime_common compiles with the wrapped calls; the helper passes a standalone test covering POSIX/Windows/UNC paths, URL/relative-path preservation, and that usernames never survive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 13 +- .../core/platform/telemetry_redaction.h | 137 ++++++++++++++++++ .../core/platform/windows/telemetry.cc | 15 +- .../test/platform/telemetry_redaction_test.cc | 76 ++++++++++ 4 files changed, 228 insertions(+), 13 deletions(-) create mode 100644 onnxruntime/core/platform/telemetry_redaction.h create mode 100644 onnxruntime/test/platform/telemetry_redaction_test.cc diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index bf34b7c6213f3..4c015790c75b5 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -3,6 +3,7 @@ #include "core/platform/posix/telemetry.h" #include "core/platform/posix/device_id.h" +#include "core/platform/telemetry_redaction.h" #ifdef __APPLE__ #include @@ -763,7 +764,7 @@ void PosixTelemetry::LogCompileModelComplete( .AddBool("success", success) .AddUInt32("errorCode", error_code) .AddUInt32("errorCategory", error_category) - .AddString("errorMessage", error_message) + .AddString("errorMessage", RedactAbsolutePathsForTelemetry(error_message)) .Build(); LogEventAsync(std::move(event)); @@ -790,7 +791,7 @@ void PosixTelemetry::LogRuntimeError( .AddUInt32("sessionId", session_id) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", status.ErrorMessage()) + .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) .AddString("file", std::string(file_view)) .AddString("function", function ? function : "") .AddUInt32("line", line) @@ -812,7 +813,7 @@ void PosixTelemetry::LogRuntimeInferenceError(uint32_t session_id, const common: .AddUInt32("sessionId", session_id) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", status.ErrorMessage()) + .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) .AddString("executionProviderVersions", ep_versions) .AddString("executionProviderDeviceTypes", ep_device_types) .AddString("runtimeVersion", ORT_VERSION) @@ -924,7 +925,7 @@ void PosixTelemetry::LogModelLoadEnd(uint32_t session_id, const common::Status& .AddBool("isSuccess", status.IsOK()) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", status.ErrorMessage()) + .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) .Build(); LogEventAsync(std::move(event)); @@ -942,7 +943,7 @@ void PosixTelemetry::LogSessionCreationEnd(uint32_t session_id, const common::St .AddBool("isSuccess", status.IsOK()) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", status.ErrorMessage()) + .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) .Build(); LogEventAsync(std::move(event)); @@ -1012,7 +1013,7 @@ void PosixTelemetry::LogRegisterEpLibraryEnd(const std::string& registration_nam .AddBool("isSuccess", status.IsOK()) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", status.ErrorMessage()) + .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) .Build(); LogEventAsync(std::move(event)); diff --git a/onnxruntime/core/platform/telemetry_redaction.h b/onnxruntime/core/platform/telemetry_redaction.h new file mode 100644 index 0000000000000..0c0ee6ad45d4e --- /dev/null +++ b/onnxruntime/core/platform/telemetry_redaction.h @@ -0,0 +1,137 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#pragma once + +#include +#include + +namespace onnxruntime { +namespace telemetry_detail { + +inline bool IsAsciiLetter(char c) { + return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'); +} + +inline bool IsPathSeparator(char c) { + return c == '/' || c == '\\'; +} + +// Characters that cannot appear inside a path token in a free-form message, so they terminate it. +inline bool IsPathTokenTerminator(char c) { + switch (c) { + case ' ': + case '\t': + case '\n': + case '\r': + case '\f': + case '\v': + case '"': + case '\'': + case '<': + case '>': + case '|': + case '*': + case '?': + case '(': + case ')': + case '[': + case ']': + case '{': + case '}': + case ',': + case ';': + return true; + default: + return false; + } +} + +// The character immediately before an absolute-path token must be one of these for the run to be +// treated as a path start. This prevents rewriting embedded slashes such as "a/b", a URL's "://", +// or a drive-like "X:" in the middle of another token. +inline bool IsTokenStartDelimiter(char c) { + switch (c) { + case '\0': + case ' ': + case '\t': + case '\n': + case '\r': + case '\f': + case '\v': + case '"': + case '\'': + case '(': + case '[': + case '{': + case '=': + return true; + default: + return false; + } +} + +} // namespace telemetry_detail + +// Reduce absolute filesystem paths embedded in a free-form string to their final component +// (basename), so telemetry does not transmit usernames or local directory layout. This mirrors the +// basename-only handling that LogRuntimeError already applies to its `file` field. Recognized +// absolute paths: POSIX ("/a/b/c"), Windows drive ("C:\\a\\b" or "C:/a/b") and UNC +// ("\\\\server\\share\\a"). Relative paths, URLs ("scheme://..."), and all other text are preserved. +inline std::string RedactAbsolutePathsForTelemetry(std::string_view message) { + using namespace telemetry_detail; + + std::string out; + out.reserve(message.size()); + + const size_t n = message.size(); + size_t i = 0; + while (i < n) { + const char c = message[i]; + const char prev = out.empty() ? '\0' : out.back(); + bool is_path_start = false; + + if (c == '/') { + // POSIX absolute path. Require a delimiter before '/' and exclude a following '/' so that + // scheme-relative ("//host") and "scheme://" URLs are left intact. + const bool next_is_slash = (i + 1 < n) && message[i + 1] == '/'; + is_path_start = IsTokenStartDelimiter(prev) && !next_is_slash; + } else if (IsAsciiLetter(c) && i + 2 < n && message[i + 1] == ':' && IsPathSeparator(message[i + 2])) { + // Windows drive-letter path "X:\" or "X:/". + is_path_start = IsTokenStartDelimiter(prev); + } else if (c == '\\' && i + 1 < n && message[i + 1] == '\\') { + // Windows UNC path "\\server\share\...". + is_path_start = IsTokenStartDelimiter(prev); + } + + if (!is_path_start) { + out.push_back(c); + ++i; + continue; + } + + // Consume the whole path token, then keep only its basename. + size_t j = i; + while (j < n && !IsPathTokenTerminator(message[j])) { + ++j; + } + std::string_view path = message.substr(i, j - i); + while (!path.empty() && IsPathSeparator(path.back())) { + path.remove_suffix(1); + } + const size_t last_sep = path.find_last_of("/\\"); + const std::string_view basename = + (last_sep == std::string_view::npos) ? path : path.substr(last_sep + 1); + if (basename.empty()) { + // The token was a bare root ("/", "\\", "C:\\"); keep it verbatim rather than emit nothing. + out.append(message.data() + i, j - i); + } else { + out.append(basename.data(), basename.size()); + } + i = j; + } + + return out; +} + +} // namespace onnxruntime diff --git a/onnxruntime/core/platform/windows/telemetry.cc b/onnxruntime/core/platform/windows/telemetry.cc index efd25ab23f1e1..e4debaa0c39a8 100644 --- a/onnxruntime/core/platform/windows/telemetry.cc +++ b/onnxruntime/core/platform/windows/telemetry.cc @@ -13,6 +13,7 @@ #include #include "core/common/logging/logging.h" #include "onnxruntime_config.h" +#include "core/platform/telemetry_redaction.h" // ETW includes // need space after Windows.h to prevent clang-format re-ordering breaking the build. @@ -543,7 +544,7 @@ void WindowsTelemetry::LogCompileModelComplete(uint32_t session_id, TraceLoggingBool(success, "success"), TraceLoggingUInt32(error_code, "errorCode"), TraceLoggingUInt32(error_category, "errorCategory"), - TraceLoggingString(error_message.c_str(), "errorMessage"), + TraceLoggingString(RedactAbsolutePathsForTelemetry(error_message).c_str(), "errorMessage"), TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName")); } @@ -566,7 +567,7 @@ void WindowsTelemetry::LogRuntimeError(uint32_t session_id, const common::Status TraceLoggingUInt32(session_id, "sessionId"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(status.ErrorMessage().c_str(), "errorMessage"), + TraceLoggingString(RedactAbsolutePathsForTelemetry(status.ErrorMessage()).c_str(), "errorMessage"), TraceLoggingString(file, "file"), TraceLoggingString(function, "function"), TraceLoggingInt32(line, "line"), @@ -584,7 +585,7 @@ void WindowsTelemetry::LogRuntimeError(uint32_t session_id, const common::Status TraceLoggingUInt32(session_id, "sessionId"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(status.ErrorMessage().c_str(), "errorMessage"), + TraceLoggingString(RedactAbsolutePathsForTelemetry(status.ErrorMessage()).c_str(), "errorMessage"), TraceLoggingString(file, "file"), TraceLoggingString(function, "function"), TraceLoggingInt32(line, "line"), @@ -610,7 +611,7 @@ void WindowsTelemetry::LogRuntimeInferenceError(uint32_t session_id, const commo TraceLoggingUInt32(session_id, "sessionId"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(status.ErrorMessage().c_str(), "errorMessage"), + TraceLoggingString(RedactAbsolutePathsForTelemetry(status.ErrorMessage()).c_str(), "errorMessage"), TraceLoggingString(ep_versions.c_str(), "executionProviderVersions"), TraceLoggingString(ep_device_types.c_str(), "executionProviderDeviceTypes"), TraceLoggingString(ORT_VERSION, "runtimeVersion"), @@ -831,7 +832,7 @@ void WindowsTelemetry::LogModelLoadEnd(uint32_t session_id, const common::Status TraceLoggingBool(status.IsOK(), "isSuccess"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(status.IsOK() ? "" : status.ErrorMessage().c_str(), "errorMessage"), + TraceLoggingString((status.IsOK() ? std::string() : RedactAbsolutePathsForTelemetry(status.ErrorMessage())).c_str(), "errorMessage"), TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName")); } @@ -852,7 +853,7 @@ void WindowsTelemetry::LogSessionCreationEnd(uint32_t session_id, TraceLoggingBool(status.IsOK(), "isSuccess"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(status.IsOK() ? "" : status.ErrorMessage().c_str(), "errorMessage"), + TraceLoggingString((status.IsOK() ? std::string() : RedactAbsolutePathsForTelemetry(status.ErrorMessage())).c_str(), "errorMessage"), TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName")); } @@ -908,7 +909,7 @@ void WindowsTelemetry::LogRegisterEpLibraryEnd(const std::string& registration_n TraceLoggingBool(status.IsOK(), "isSuccess"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(status.IsOK() ? "" : status.ErrorMessage().c_str(), "errorMessage"), + TraceLoggingString((status.IsOK() ? std::string() : RedactAbsolutePathsForTelemetry(status.ErrorMessage())).c_str(), "errorMessage"), TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName")); } diff --git a/onnxruntime/test/platform/telemetry_redaction_test.cc b/onnxruntime/test/platform/telemetry_redaction_test.cc new file mode 100644 index 0000000000000..25226c7764d1b --- /dev/null +++ b/onnxruntime/test/platform/telemetry_redaction_test.cc @@ -0,0 +1,76 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#include "core/platform/telemetry_redaction.h" + +#include + +#include "gtest/gtest.h" + +namespace onnxruntime { +namespace test { + +TEST(TelemetryRedactionTest, EmptyAndNoPath) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry(""), ""); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("no path here"), "no path here"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("error code 13"), "error code 13"); +} + +TEST(TelemetryRedactionTest, PosixAbsolutePathReducedToBasename) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load model from /home/alice/models/foo.onnx failed"), + "Load model from foo.onnx failed"); + // The username in the directory is dropped. + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/var/lib/onnxruntime/cache/x.bin").find("lib"), + std::string::npos); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/tmp/onnxruntime_telemetry_cache/db"), "db"); +} + +TEST(TelemetryRedactionTest, WindowsDrivePathReducedToBasename) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load C:\\Users\\bob\\m.onnx failed"), + "Load m.onnx failed"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("open D:/data/secret/model.onnx"), + "open model.onnx"); + // Username 'bob' must not survive. + EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Users\\bob\\m.onnx").find("bob"), + std::string::npos); +} + +TEST(TelemetryRedactionTest, UncPathReducedToBasename) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("from \\\\server\\share\\dir\\weights.bin done"), + "from weights.bin done"); +} + +TEST(TelemetryRedactionTest, UrlsArePreserved) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("see https://example.com/a/b/c for details"), + "see https://example.com/a/b/c for details"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("ftp://host/path/file"), "ftp://host/path/file"); +} + +TEST(TelemetryRedactionTest, RelativePathsAndSlashesPreserved) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("models/foo.onnx"), "models/foo.onnx"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("a/b/c"), "a/b/c"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("ratio 3/4 and and/or"), "ratio 3/4 and and/or"); +} + +TEST(TelemetryRedactionTest, QuotedPath) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("file \"/home/alice/x/y.onnx\" missing"), + "file \"y.onnx\" missing"); +} + +TEST(TelemetryRedactionTest, MultiplePaths) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("copy /home/u/a.onnx to /opt/cache/a.onnx"), + "copy a.onnx to a.onnx"); +} + +TEST(TelemetryRedactionTest, BareRootsArePreserved) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("at / root"), "at / root"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("scheme-relative //host/x"), "scheme-relative //host/x"); +} + +TEST(TelemetryRedactionTest, TrailingPunctuationAfterPath) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("missing /home/alice/models/foo.onnx, retry"), + "missing foo.onnx, retry"); +} + +} // namespace test +} // namespace onnxruntime From 6d5f2a796eb2741c7c6f82612fe428b63f6bff8e Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Thu, 25 Jun 2026 19:46:25 -0500 Subject: [PATCH 53/61] Harden telemetry path scrubber per code review Address gaps found while reviewing the errorMessage path scrubber: - A path ending at the user's home dir (/home/, /Users/, C:\\Users\\, /root) now reduces to '~' instead of emitting the bare username as the basename. - Paths containing spaces are kept whole while the path clearly continues, so 'C:\\Users\\First Last\\model.onnx' and 'C:\\Program Files\\...' reduce to the basename instead of leaking the username/layout after the first space. - Absolute paths glued to ':' ',' ';' (e.g. 'failed:/abs/path') are now redacted. - file:// URIs have their embedded local path redacted (http/https/ftp still preserved). Extend the unit tests for these cases. Validated with a standalone run of the header over all cases and an onnxruntime_common compile on Linux (gcc-13). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../core/platform/telemetry_redaction.h | 124 ++++++++++++++++-- .../test/platform/telemetry_redaction_test.cc | 33 +++++ 2 files changed, 144 insertions(+), 13 deletions(-) diff --git a/onnxruntime/core/platform/telemetry_redaction.h b/onnxruntime/core/platform/telemetry_redaction.h index 0c0ee6ad45d4e..655eb5a7f549f 100644 --- a/onnxruntime/core/platform/telemetry_redaction.h +++ b/onnxruntime/core/platform/telemetry_redaction.h @@ -17,10 +17,10 @@ inline bool IsPathSeparator(char c) { return c == '/' || c == '\\'; } -// Characters that cannot appear inside a path token in a free-form message, so they terminate it. -inline bool IsPathTokenTerminator(char c) { +// Characters that unconditionally end a path token. A space is handled separately (it only ends the +// token when the path does not clearly continue) so that paths containing spaces are not split. +inline bool IsHardTerminator(char c) { switch (c) { - case ' ': case '\t': case '\n': case '\r': @@ -48,8 +48,9 @@ inline bool IsPathTokenTerminator(char c) { } // The character immediately before an absolute-path token must be one of these for the run to be -// treated as a path start. This prevents rewriting embedded slashes such as "a/b", a URL's "://", -// or a drive-like "X:" in the middle of another token. +// treated as a path start. This prevents rewriting embedded slashes such as "a/b" or a drive-like +// "X:" in the middle of another token, while still catching paths glued to ':' ',' ';' +// (e.g. "failed:/abs/path", "a.bin,/abs/path"). inline bool IsTokenStartDelimiter(char c) { switch (c) { case '\0': @@ -65,19 +66,73 @@ inline bool IsTokenStartDelimiter(char c) { case '[': case '{': case '=': + case ':': + case ',': + case ';': return true; default: return false; } } +inline bool EqualsAsciiCI(std::string_view a, std::string_view b) { + if (a.size() != b.size()) { + return false; + } + for (size_t k = 0; k < a.size(); ++k) { + char ca = a[k]; + char cb = b[k]; + if (ca >= 'A' && ca <= 'Z') { + ca = static_cast(ca - 'A' + 'a'); + } + if (cb >= 'A' && cb <= 'Z') { + cb = static_cast(cb - 'A' + 'a'); + } + if (ca != cb) { + return false; + } + } + return true; +} + +// True when the text already emitted ends with the "file:" URI scheme (delimited on the left). Unlike +// http/https/ftp, a file:// URI embeds a local path, so its path should still be redacted. +inline bool EndsWithFileScheme(const std::string& out) { + constexpr std::string_view kFile = "file:"; + if (out.size() < kFile.size()) { + return false; + } + if (!EqualsAsciiCI(std::string_view{out}.substr(out.size() - kFile.size()), kFile)) { + return false; + } + if (out.size() == kFile.size()) { + return true; + } + return IsTokenStartDelimiter(out[out.size() - kFile.size() - 1]); +} + +// Directory names whose immediate child component is a username; used to redact a path that ends at +// the user's home directory to "~" instead of emitting the bare username as the basename. +inline bool IsUserRootComponent(std::string_view comp) { + return EqualsAsciiCI(comp, "home") || EqualsAsciiCI(comp, "users") || + EqualsAsciiCI(comp, "Documents and Settings"); +} + } // namespace telemetry_detail // Reduce absolute filesystem paths embedded in a free-form string to their final component // (basename), so telemetry does not transmit usernames or local directory layout. This mirrors the // basename-only handling that LogRuntimeError already applies to its `file` field. Recognized -// absolute paths: POSIX ("/a/b/c"), Windows drive ("C:\\a\\b" or "C:/a/b") and UNC -// ("\\\\server\\share\\a"). Relative paths, URLs ("scheme://..."), and all other text are preserved. +// absolute paths: POSIX ("/a/b/c"), Windows drive ("C:\\a\\b" or "C:/a/b"), UNC +// ("\\\\server\\share\\a") and file:// URIs. A path that ends at the user's home directory is +// reduced to "~" rather than the bare username. Internal spaces are tolerated while the path clearly +// continues (so "C:\\Users\\First Last\\m.onnx" is fully reduced). Relative paths, http/https/ftp +// URLs, and all other text are preserved. +// +// Known limitation: a username that both contains a space and is the terminal path component with no +// trailing file (e.g. "C:\\Users\\First Last" with nothing after it) only has its first word redacted, +// because the end of a space-separated name cannot be told apart from following prose without +// over-consuming real error text. inline std::string RedactAbsolutePathsForTelemetry(std::string_view message) { using namespace telemetry_detail; @@ -92,10 +147,12 @@ inline std::string RedactAbsolutePathsForTelemetry(std::string_view message) { bool is_path_start = false; if (c == '/') { - // POSIX absolute path. Require a delimiter before '/' and exclude a following '/' so that - // scheme-relative ("//host") and "scheme://" URLs are left intact. + // POSIX absolute path. Require a delimiter before '/'. A following '/' marks a "scheme://" or + // scheme-relative URL, which is preserved -- except for file://, whose embedded local path is + // still redacted. const bool next_is_slash = (i + 1 < n) && message[i + 1] == '/'; - is_path_start = IsTokenStartDelimiter(prev) && !next_is_slash; + const bool protected_url = next_is_slash && !EndsWithFileScheme(out); + is_path_start = IsTokenStartDelimiter(prev) && !protected_url; } else if (IsAsciiLetter(c) && i + 2 < n && message[i + 1] == ':' && IsPathSeparator(message[i + 2])) { // Windows drive-letter path "X:\" or "X:/". is_path_start = IsTokenStartDelimiter(prev); @@ -110,18 +167,59 @@ inline std::string RedactAbsolutePathsForTelemetry(std::string_view message) { continue; } - // Consume the whole path token, then keep only its basename. + // Consume the path token. A space ends it only when the run after the space has no path + // separator, so paths with spaces ("C:\Program Files\x", "C:\Users\First Last\m.onnx") stay whole. size_t j = i; - while (j < n && !IsPathTokenTerminator(message[j])) { + while (j < n) { + const char cj = message[j]; + if (cj == ' ') { + size_t k = j; + while (k < n && message[k] == ' ') { + ++k; + } + size_t m = k; + bool run_has_separator = false; + while (m < n && message[m] != ' ' && !IsHardTerminator(message[m])) { + if (IsPathSeparator(message[m])) { + run_has_separator = true; + } + ++m; + } + if (k < n && run_has_separator) { + j = m; + continue; + } + break; + } + if (IsHardTerminator(cj)) { + break; + } ++j; } + std::string_view path = message.substr(i, j - i); while (!path.empty() && IsPathSeparator(path.back())) { path.remove_suffix(1); } + const size_t last_sep = path.find_last_of("/\\"); - const std::string_view basename = + std::string_view basename = (last_sep == std::string_view::npos) ? path : path.substr(last_sep + 1); + + // If the path ends at the user's home directory, the basename is the username; emit "~" instead. + if (last_sep != std::string_view::npos) { + const std::string_view parent_dir = path.substr(0, last_sep); + const size_t parent_sep = parent_dir.find_last_of("/\\"); + const std::string_view parent = + (parent_sep == std::string_view::npos) ? parent_dir : parent_dir.substr(parent_sep + 1); + if (IsUserRootComponent(parent)) { + basename = "~"; + } + } + if (EqualsAsciiCI(path, "/root")) { + basename = "~"; + } + if (basename.empty()) { // The token was a bare root ("/", "\\", "C:\\"); keep it verbatim rather than emit nothing. out.append(message.data() + i, j - i); diff --git a/onnxruntime/test/platform/telemetry_redaction_test.cc b/onnxruntime/test/platform/telemetry_redaction_test.cc index 25226c7764d1b..76d8ee849e9b5 100644 --- a/onnxruntime/test/platform/telemetry_redaction_test.cc +++ b/onnxruntime/test/platform/telemetry_redaction_test.cc @@ -72,5 +72,38 @@ TEST(TelemetryRedactionTest, TrailingPunctuationAfterPath) { "missing foo.onnx, retry"); } +TEST(TelemetryRedactionTest, HomeDirectoryReducedToTilde) { + // A path that ends at the user's home directory must not emit the bare username. + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/home/alice"), "~"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/Users/alice"), "~"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Users\\bob"), "~"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("could not access /home/alice"), "could not access ~"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/root"), "~"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Users\\bob").find("bob"), std::string::npos); +} + +TEST(TelemetryRedactionTest, PathsWithSpacesAreFullyReduced) { + // The username and directory layout are dropped even when the path contains spaces. + EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load C:\\Users\\First Last\\model.onnx failed"), + "Load model.onnx failed"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Program Files\\foo\\bar.dll"), "bar.dll"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/Users/bob/Library/Application Support/x/m.onnx"), + "m.onnx"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load C:\\Users\\First Last\\model.onnx failed").find("First"), + std::string::npos); +} + +TEST(TelemetryRedactionTest, PathsGluedToPunctuation) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("input:/home/alice/secret/m.onnx"), "input:m.onnx"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("paths /a/b/c.txt,/x/y/z.txt done"), + "paths c.txt,z.txt done"); +} + +TEST(TelemetryRedactionTest, FileUriRedactedButHttpPreserved) { + EXPECT_EQ(RedactAbsolutePathsForTelemetry("file:///home/alice/secret/model.onnx"), "file:model.onnx"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("see https://example.com/a/b for x"), + "see https://example.com/a/b for x"); +} + } // namespace test } // namespace onnxruntime From 08f62700e697b452082fa0a399b632a338a05443 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Fri, 26 Jun 2026 02:56:08 -0500 Subject: [PATCH 54/61] Fix telemetry scrubber over-redacting files under unrelated home/users dirs Material review found the home-directory redaction fired whenever a path's parent component was literally named 'home'/'users', regardless of position, destroying a real file under an unrelated directory: '/usr/home/config.txt' and '/opt/users/data.bin' both became '~'. Restrict the '~' reduction to paths whose home marker is the first component ('/home/X', '/Users/X', 'X:\\Users\\X'). Also drop the unreachable 'Documents and Settings' marker (the space-tolerant consumer never yields it as a component), which falsely implied legacy-profile coverage. Add tests for the over-redaction guard. Validated on Linux (gcc-13): onnxruntime_common compiles and the standalone scrubber passes all cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/telemetry_redaction.h | 15 +++++++++++---- .../test/platform/telemetry_redaction_test.cc | 8 ++++++++ 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/onnxruntime/core/platform/telemetry_redaction.h b/onnxruntime/core/platform/telemetry_redaction.h index 655eb5a7f549f..6b6afff28f805 100644 --- a/onnxruntime/core/platform/telemetry_redaction.h +++ b/onnxruntime/core/platform/telemetry_redaction.h @@ -112,10 +112,11 @@ inline bool EndsWithFileScheme(const std::string& out) { } // Directory names whose immediate child component is a username; used to redact a path that ends at -// the user's home directory to "~" instead of emitting the bare username as the basename. +// the user's home directory to "~" instead of emitting the bare username as the basename. Only +// matched when the marker is the first path component (see RedactAbsolutePathsForTelemetry), so a +// real file under an unrelated directory of the same name is not over-redacted. inline bool IsUserRootComponent(std::string_view comp) { - return EqualsAsciiCI(comp, "home") || EqualsAsciiCI(comp, "users") || - EqualsAsciiCI(comp, "Documents and Settings"); + return EqualsAsciiCI(comp, "home") || EqualsAsciiCI(comp, "users"); } } // namespace telemetry_detail @@ -207,12 +208,18 @@ inline std::string RedactAbsolutePathsForTelemetry(std::string_view message) { (last_sep == std::string_view::npos) ? path : path.substr(last_sep + 1); // If the path ends at the user's home directory, the basename is the username; emit "~" instead. + // Only when the home marker is the first path component ("/home/X", "/Users/X", "X:\Users\X") so + // that a real file under an unrelated directory named home/users (e.g. "/usr/home/config.txt") is + // not over-redacted. if (last_sep != std::string_view::npos) { const std::string_view parent_dir = path.substr(0, last_sep); const size_t parent_sep = parent_dir.find_last_of("/\\"); const std::string_view parent = (parent_sep == std::string_view::npos) ? parent_dir : parent_dir.substr(parent_sep + 1); - if (IsUserRootComponent(parent)) { + const bool marker_at_root = (parent_sep == 0); + const bool marker_after_drive = (parent_sep == 2 && parent_dir.size() >= 3 && + IsAsciiLetter(parent_dir[0]) && parent_dir[1] == ':'); + if ((marker_at_root || marker_after_drive) && IsUserRootComponent(parent)) { basename = "~"; } } diff --git a/onnxruntime/test/platform/telemetry_redaction_test.cc b/onnxruntime/test/platform/telemetry_redaction_test.cc index 76d8ee849e9b5..6029bf96881bd 100644 --- a/onnxruntime/test/platform/telemetry_redaction_test.cc +++ b/onnxruntime/test/platform/telemetry_redaction_test.cc @@ -82,6 +82,14 @@ TEST(TelemetryRedactionTest, HomeDirectoryReducedToTilde) { EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Users\\bob").find("bob"), std::string::npos); } +TEST(TelemetryRedactionTest, DoesNotOverRedactUnrelatedHomeUsersDirs) { + // A real file under a directory merely named home/users (not the first path component) is kept. + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/usr/home/config.txt"), "config.txt"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/opt/users/data.bin"), "data.bin"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/var/lib/users/cache.db"), "cache.db"); + EXPECT_EQ(RedactAbsolutePathsForTelemetry("/home/alice/models/foo.onnx"), "foo.onnx"); +} + TEST(TelemetryRedactionTest, PathsWithSpacesAreFullyReduced) { // The username and directory layout are dropped even when the path contains spaces. EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load C:\\Users\\First Last\\model.onnx failed"), From 803b16d763dcff45506f2010a504316924c68d8c Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Fri, 26 Jun 2026 03:30:07 -0500 Subject: [PATCH 55/61] Align POSIX/Windows telemetry error scrubbing with onnxruntime-genai For cross-repo consistency with onnxruntime-genai's telemetry pipeline: - Replace the basename path scrubber with genai's LooksLikePath/ScrubErrorMessage, which replaces each path-like whitespace token (POSIX multi-segment, Windows drive/UNC, '~/', and URLs) with a '[path]' placeholder. - Cap the scrubbed errorMessage at 256 bytes (kMaxTelemetryErrorMessageLength), matching genai's per-call-site length guard. - POSIX 1DS Initialize: set config['enableIpScrubbing']=true for collector-side client-IP obfuscation, and honor an ORT_TELEMETRY_ENABLED=0/false environment opt-out (skips creating the uploader), mirroring genai's enableIpScrubbing and ORTGENAI_TELEMETRY_ENABLED. Rewrite the unit tests for the [path] behaviour and length cap. Validated on Linux (gcc-13): onnxruntime_common compiles and the standalone scrubber passes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 24 +- .../core/platform/telemetry_redaction.h | 248 +++--------------- .../core/platform/windows/telemetry.cc | 14 +- .../test/platform/telemetry_redaction_test.cc | 121 +++------ 4 files changed, 100 insertions(+), 307 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 4c015790c75b5..05797313ff69e 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -291,6 +291,17 @@ void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventPropert void PosixTelemetry::Initialize() { std::unique_lock lock(mutex_); + // Environment opt-out: ORT_TELEMETRY_ENABLED=0/false disables telemetry at runtime without + // recompiling and skips creating the 1DS uploader entirely. Mirrors onnxruntime-genai's + // ORTGENAI_TELEMETRY_ENABLED opt-out. + if (const char* env = std::getenv("ORT_TELEMETRY_ENABLED"); env != nullptr) { + const std::string value(env); + if (value == "0" || value == "false" || value == "FALSE") { + enabled_.store(false, std::memory_order_release); + return; + } + } + // NOTE: On Android, the Java layer must be initialized before calling this: // System.loadLibrary("maesdk"); // new HttpClient(getApplicationContext()); @@ -303,6 +314,7 @@ void PosixTelemetry::Initialize() { auto& config = *config_; config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; + config["enableIpScrubbing"] = true; // collector-side client-IP obfuscation config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown @@ -764,7 +776,7 @@ void PosixTelemetry::LogCompileModelComplete( .AddBool("success", success) .AddUInt32("errorCode", error_code) .AddUInt32("errorCategory", error_category) - .AddString("errorMessage", RedactAbsolutePathsForTelemetry(error_message)) + .AddString("errorMessage", ScrubErrorMessage(error_message)) .Build(); LogEventAsync(std::move(event)); @@ -791,7 +803,7 @@ void PosixTelemetry::LogRuntimeError( .AddUInt32("sessionId", session_id) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) + .AddString("errorMessage", ScrubErrorMessage(status.ErrorMessage())) .AddString("file", std::string(file_view)) .AddString("function", function ? function : "") .AddUInt32("line", line) @@ -813,7 +825,7 @@ void PosixTelemetry::LogRuntimeInferenceError(uint32_t session_id, const common: .AddUInt32("sessionId", session_id) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) + .AddString("errorMessage", ScrubErrorMessage(status.ErrorMessage())) .AddString("executionProviderVersions", ep_versions) .AddString("executionProviderDeviceTypes", ep_device_types) .AddString("runtimeVersion", ORT_VERSION) @@ -925,7 +937,7 @@ void PosixTelemetry::LogModelLoadEnd(uint32_t session_id, const common::Status& .AddBool("isSuccess", status.IsOK()) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) + .AddString("errorMessage", ScrubErrorMessage(status.ErrorMessage())) .Build(); LogEventAsync(std::move(event)); @@ -943,7 +955,7 @@ void PosixTelemetry::LogSessionCreationEnd(uint32_t session_id, const common::St .AddBool("isSuccess", status.IsOK()) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) + .AddString("errorMessage", ScrubErrorMessage(status.ErrorMessage())) .Build(); LogEventAsync(std::move(event)); @@ -1013,7 +1025,7 @@ void PosixTelemetry::LogRegisterEpLibraryEnd(const std::string& registration_nam .AddBool("isSuccess", status.IsOK()) .AddInt32("errorCode", static_cast(status.Code())) .AddInt32("errorCategory", static_cast(status.Category())) - .AddString("errorMessage", RedactAbsolutePathsForTelemetry(status.ErrorMessage())) + .AddString("errorMessage", ScrubErrorMessage(status.ErrorMessage())) .Build(); LogEventAsync(std::move(event)); diff --git a/onnxruntime/core/platform/telemetry_redaction.h b/onnxruntime/core/platform/telemetry_redaction.h index 6b6afff28f805..a46c0e362d083 100644 --- a/onnxruntime/core/platform/telemetry_redaction.h +++ b/onnxruntime/core/platform/telemetry_redaction.h @@ -3,239 +3,75 @@ #pragma once +#include #include #include namespace onnxruntime { namespace telemetry_detail { -inline bool IsAsciiLetter(char c) { - return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'); -} - -inline bool IsPathSeparator(char c) { - return c == '/' || c == '\\'; -} - -// Characters that unconditionally end a path token. A space is handled separately (it only ends the -// token when the path does not clearly continue) so that paths containing spaces are not split. -inline bool IsHardTerminator(char c) { - switch (c) { - case '\t': - case '\n': - case '\r': - case '\f': - case '\v': - case '"': - case '\'': - case '<': - case '>': - case '|': - case '*': - case '?': - case '(': - case ')': - case '[': - case ']': - case '{': - case '}': - case ',': - case ';': - return true; - default: - return false; +// Returns true if a whitespace-delimited token looks like a filesystem path. Used to redact paths +// (which embed usernames / directory layout) from error text. Mirrors onnxruntime-genai's +// Generators::LooksLikePath so both telemetry pipelines scrub error messages identically. +inline bool LooksLikePath(std::string_view token) { + if (token.find('\\') != std::string_view::npos) { + return true; // any backslash: Windows path / UNC } -} - -// The character immediately before an absolute-path token must be one of these for the run to be -// treated as a path start. This prevents rewriting embedded slashes such as "a/b" or a drive-like -// "X:" in the middle of another token, while still catching paths glued to ':' ',' ';' -// (e.g. "failed:/abs/path", "a.bin,/abs/path"). -inline bool IsTokenStartDelimiter(char c) { - switch (c) { - case '\0': - case ' ': - case '\t': - case '\n': - case '\r': - case '\f': - case '\v': - case '"': - case '\'': - case '(': - case '[': - case '{': - case '=': - case ':': - case ',': - case ';': - return true; - default: - return false; + if (token.size() >= 3 && std::isalpha(static_cast(token[0])) && token[1] == ':' && + (token[2] == '\\' || token[2] == '/')) { + return true; // drive-letter prefix: C:\ or C:/ } -} - -inline bool EqualsAsciiCI(std::string_view a, std::string_view b) { - if (a.size() != b.size()) { - return false; + if (token.size() >= 2 && token[0] == '~' && (token[1] == '/' || token[1] == '\\')) { + return true; // home-relative: ~/ or ~\. } - for (size_t k = 0; k < a.size(); ++k) { - char ca = a[k]; - char cb = b[k]; - if (ca >= 'A' && ca <= 'Z') { - ca = static_cast(ca - 'A' + 'a'); + int segments = 0; // count "/x" runs; 2+ indicates a multi-segment POSIX path + for (size_t k = 0; k + 1 < token.size(); ++k) { + if (token[k] == '/' && token[k + 1] != '/') { + ++segments; } - if (cb >= 'A' && cb <= 'Z') { - cb = static_cast(cb - 'A' + 'a'); - } - if (ca != cb) { - return false; - } - } - return true; -} - -// True when the text already emitted ends with the "file:" URI scheme (delimited on the left). Unlike -// http/https/ftp, a file:// URI embeds a local path, so its path should still be redacted. -inline bool EndsWithFileScheme(const std::string& out) { - constexpr std::string_view kFile = "file:"; - if (out.size() < kFile.size()) { - return false; - } - if (!EqualsAsciiCI(std::string_view{out}.substr(out.size() - kFile.size()), kFile)) { - return false; } - if (out.size() == kFile.size()) { - return true; - } - return IsTokenStartDelimiter(out[out.size() - kFile.size() - 1]); -} - -// Directory names whose immediate child component is a username; used to redact a path that ends at -// the user's home directory to "~" instead of emitting the bare username as the basename. Only -// matched when the marker is the first path component (see RedactAbsolutePathsForTelemetry), so a -// real file under an unrelated directory of the same name is not over-redacted. -inline bool IsUserRootComponent(std::string_view comp) { - return EqualsAsciiCI(comp, "home") || EqualsAsciiCI(comp, "users"); + return segments >= 2; } } // namespace telemetry_detail -// Reduce absolute filesystem paths embedded in a free-form string to their final component -// (basename), so telemetry does not transmit usernames or local directory layout. This mirrors the -// basename-only handling that LogRuntimeError already applies to its `file` field. Recognized -// absolute paths: POSIX ("/a/b/c"), Windows drive ("C:\\a\\b" or "C:/a/b"), UNC -// ("\\\\server\\share\\a") and file:// URIs. A path that ends at the user's home directory is -// reduced to "~" rather than the bare username. Internal spaces are tolerated while the path clearly -// continues (so "C:\\Users\\First Last\\m.onnx" is fully reduced). Relative paths, http/https/ftp -// URLs, and all other text are preserved. -// -// Known limitation: a username that both contains a space and is the terminal path component with no -// trailing file (e.g. "C:\\Users\\First Last" with nothing after it) only has its first word redacted, -// because the end of a space-separated name cannot be told apart from following prose without -// over-consuming real error text. -inline std::string RedactAbsolutePathsForTelemetry(std::string_view message) { - using namespace telemetry_detail; +// Maximum transmitted error-message length, applied after scrubbing to bound telemetry payload size. +inline constexpr size_t kMaxTelemetryErrorMessageLength = 256; + +// Scrub filesystem paths out of a free-text error string before transmission and cap its length. +// Each whitespace-delimited token that looks like a path is replaced with a "[path]" placeholder, so +// load/runtime exceptions don't ship the user's config/model path (e.g. C:\Users\\... or +// /home//...) and thereby the username and directory layout. Mirrors onnxruntime-genai's +// Generators::ScrubErrorMessage so both pipelines redact identically; the trailing length cap matches +// the 256-byte guard genai applies at its call sites. +inline std::string ScrubErrorMessage(std::string_view msg) { + using telemetry_detail::LooksLikePath; std::string out; - out.reserve(message.size()); + out.reserve(msg.size()); - const size_t n = message.size(); size_t i = 0; - while (i < n) { - const char c = message[i]; - const char prev = out.empty() ? '\0' : out.back(); - bool is_path_start = false; - - if (c == '/') { - // POSIX absolute path. Require a delimiter before '/'. A following '/' marks a "scheme://" or - // scheme-relative URL, which is preserved -- except for file://, whose embedded local path is - // still redacted. - const bool next_is_slash = (i + 1 < n) && message[i + 1] == '/'; - const bool protected_url = next_is_slash && !EndsWithFileScheme(out); - is_path_start = IsTokenStartDelimiter(prev) && !protected_url; - } else if (IsAsciiLetter(c) && i + 2 < n && message[i + 1] == ':' && IsPathSeparator(message[i + 2])) { - // Windows drive-letter path "X:\" or "X:/". - is_path_start = IsTokenStartDelimiter(prev); - } else if (c == '\\' && i + 1 < n && message[i + 1] == '\\') { - // Windows UNC path "\\server\share\...". - is_path_start = IsTokenStartDelimiter(prev); - } - - if (!is_path_start) { - out.push_back(c); + while (i < msg.size()) { + if (std::isspace(static_cast(msg[i]))) { + out.push_back(msg[i]); ++i; continue; } - - // Consume the path token. A space ends it only when the run after the space has no path - // separator, so paths with spaces ("C:\Program Files\x", "C:\Users\First Last\m.onnx") stay whole. - size_t j = i; - while (j < n) { - const char cj = message[j]; - if (cj == ' ') { - size_t k = j; - while (k < n && message[k] == ' ') { - ++k; - } - size_t m = k; - bool run_has_separator = false; - while (m < n && message[m] != ' ' && !IsHardTerminator(message[m])) { - if (IsPathSeparator(message[m])) { - run_has_separator = true; - } - ++m; - } - if (k < n && run_has_separator) { - j = m; - continue; - } - break; - } - if (IsHardTerminator(cj)) { - break; - } - ++j; - } - - std::string_view path = message.substr(i, j - i); - while (!path.empty() && IsPathSeparator(path.back())) { - path.remove_suffix(1); - } - - const size_t last_sep = path.find_last_of("/\\"); - std::string_view basename = - (last_sep == std::string_view::npos) ? path : path.substr(last_sep + 1); - - // If the path ends at the user's home directory, the basename is the username; emit "~" instead. - // Only when the home marker is the first path component ("/home/X", "/Users/X", "X:\Users\X") so - // that a real file under an unrelated directory named home/users (e.g. "/usr/home/config.txt") is - // not over-redacted. - if (last_sep != std::string_view::npos) { - const std::string_view parent_dir = path.substr(0, last_sep); - const size_t parent_sep = parent_dir.find_last_of("/\\"); - const std::string_view parent = - (parent_sep == std::string_view::npos) ? parent_dir : parent_dir.substr(parent_sep + 1); - const bool marker_at_root = (parent_sep == 0); - const bool marker_after_drive = (parent_sep == 2 && parent_dir.size() >= 3 && - IsAsciiLetter(parent_dir[0]) && parent_dir[1] == ':'); - if ((marker_at_root || marker_after_drive) && IsUserRootComponent(parent)) { - basename = "~"; - } - } - if (EqualsAsciiCI(path, "/root")) { - basename = "~"; + const size_t start = i; + while (i < msg.size() && !std::isspace(static_cast(msg[i]))) { + ++i; } - - if (basename.empty()) { - // The token was a bare root ("/", "\\", "C:\\"); keep it verbatim rather than emit nothing. - out.append(message.data() + i, j - i); + const std::string_view token = msg.substr(start, i - start); + if (LooksLikePath(token)) { + out += "[path]"; } else { - out.append(basename.data(), basename.size()); + out.append(token.data(), token.size()); } - i = j; } + if (out.size() > kMaxTelemetryErrorMessageLength) { + out.resize(kMaxTelemetryErrorMessageLength); + } return out; } diff --git a/onnxruntime/core/platform/windows/telemetry.cc b/onnxruntime/core/platform/windows/telemetry.cc index e4debaa0c39a8..2080ee0a56b82 100644 --- a/onnxruntime/core/platform/windows/telemetry.cc +++ b/onnxruntime/core/platform/windows/telemetry.cc @@ -544,7 +544,7 @@ void WindowsTelemetry::LogCompileModelComplete(uint32_t session_id, TraceLoggingBool(success, "success"), TraceLoggingUInt32(error_code, "errorCode"), TraceLoggingUInt32(error_category, "errorCategory"), - TraceLoggingString(RedactAbsolutePathsForTelemetry(error_message).c_str(), "errorMessage"), + TraceLoggingString(ScrubErrorMessage(error_message).c_str(), "errorMessage"), TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName")); } @@ -567,7 +567,7 @@ void WindowsTelemetry::LogRuntimeError(uint32_t session_id, const common::Status TraceLoggingUInt32(session_id, "sessionId"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(RedactAbsolutePathsForTelemetry(status.ErrorMessage()).c_str(), "errorMessage"), + TraceLoggingString(ScrubErrorMessage(status.ErrorMessage()).c_str(), "errorMessage"), TraceLoggingString(file, "file"), TraceLoggingString(function, "function"), TraceLoggingInt32(line, "line"), @@ -585,7 +585,7 @@ void WindowsTelemetry::LogRuntimeError(uint32_t session_id, const common::Status TraceLoggingUInt32(session_id, "sessionId"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(RedactAbsolutePathsForTelemetry(status.ErrorMessage()).c_str(), "errorMessage"), + TraceLoggingString(ScrubErrorMessage(status.ErrorMessage()).c_str(), "errorMessage"), TraceLoggingString(file, "file"), TraceLoggingString(function, "function"), TraceLoggingInt32(line, "line"), @@ -611,7 +611,7 @@ void WindowsTelemetry::LogRuntimeInferenceError(uint32_t session_id, const commo TraceLoggingUInt32(session_id, "sessionId"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString(RedactAbsolutePathsForTelemetry(status.ErrorMessage()).c_str(), "errorMessage"), + TraceLoggingString(ScrubErrorMessage(status.ErrorMessage()).c_str(), "errorMessage"), TraceLoggingString(ep_versions.c_str(), "executionProviderVersions"), TraceLoggingString(ep_device_types.c_str(), "executionProviderDeviceTypes"), TraceLoggingString(ORT_VERSION, "runtimeVersion"), @@ -832,7 +832,7 @@ void WindowsTelemetry::LogModelLoadEnd(uint32_t session_id, const common::Status TraceLoggingBool(status.IsOK(), "isSuccess"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString((status.IsOK() ? std::string() : RedactAbsolutePathsForTelemetry(status.ErrorMessage())).c_str(), "errorMessage"), + TraceLoggingString((status.IsOK() ? std::string() : ScrubErrorMessage(status.ErrorMessage())).c_str(), "errorMessage"), TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName")); } @@ -853,7 +853,7 @@ void WindowsTelemetry::LogSessionCreationEnd(uint32_t session_id, TraceLoggingBool(status.IsOK(), "isSuccess"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString((status.IsOK() ? std::string() : RedactAbsolutePathsForTelemetry(status.ErrorMessage())).c_str(), "errorMessage"), + TraceLoggingString((status.IsOK() ? std::string() : ScrubErrorMessage(status.ErrorMessage())).c_str(), "errorMessage"), TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName")); } @@ -909,7 +909,7 @@ void WindowsTelemetry::LogRegisterEpLibraryEnd(const std::string& registration_n TraceLoggingBool(status.IsOK(), "isSuccess"), TraceLoggingUInt32(status.Code(), "errorCode"), TraceLoggingUInt32(status.Category(), "errorCategory"), - TraceLoggingString((status.IsOK() ? std::string() : RedactAbsolutePathsForTelemetry(status.ErrorMessage())).c_str(), "errorMessage"), + TraceLoggingString((status.IsOK() ? std::string() : ScrubErrorMessage(status.ErrorMessage())).c_str(), "errorMessage"), TraceLoggingString(ORT_CALLER_FRAMEWORK, "frameworkName")); } diff --git a/onnxruntime/test/platform/telemetry_redaction_test.cc b/onnxruntime/test/platform/telemetry_redaction_test.cc index 6029bf96881bd..54ff94a369824 100644 --- a/onnxruntime/test/platform/telemetry_redaction_test.cc +++ b/onnxruntime/test/platform/telemetry_redaction_test.cc @@ -11,106 +11,51 @@ namespace onnxruntime { namespace test { TEST(TelemetryRedactionTest, EmptyAndNoPath) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry(""), ""); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("no path here"), "no path here"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("error code 13"), "error code 13"); + EXPECT_EQ(ScrubErrorMessage(""), ""); + EXPECT_EQ(ScrubErrorMessage("no path here"), "no path here"); + EXPECT_EQ(ScrubErrorMessage("error code 13"), "error code 13"); } -TEST(TelemetryRedactionTest, PosixAbsolutePathReducedToBasename) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load model from /home/alice/models/foo.onnx failed"), - "Load model from foo.onnx failed"); - // The username in the directory is dropped. - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/var/lib/onnxruntime/cache/x.bin").find("lib"), - std::string::npos); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/tmp/onnxruntime_telemetry_cache/db"), "db"); -} - -TEST(TelemetryRedactionTest, WindowsDrivePathReducedToBasename) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load C:\\Users\\bob\\m.onnx failed"), - "Load m.onnx failed"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("open D:/data/secret/model.onnx"), - "open model.onnx"); - // Username 'bob' must not survive. - EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Users\\bob\\m.onnx").find("bob"), - std::string::npos); -} - -TEST(TelemetryRedactionTest, UncPathReducedToBasename) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("from \\\\server\\share\\dir\\weights.bin done"), - "from weights.bin done"); -} - -TEST(TelemetryRedactionTest, UrlsArePreserved) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("see https://example.com/a/b/c for details"), - "see https://example.com/a/b/c for details"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("ftp://host/path/file"), "ftp://host/path/file"); -} - -TEST(TelemetryRedactionTest, RelativePathsAndSlashesPreserved) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("models/foo.onnx"), "models/foo.onnx"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("a/b/c"), "a/b/c"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("ratio 3/4 and and/or"), "ratio 3/4 and and/or"); +TEST(TelemetryRedactionTest, PosixPathReplacedWithPlaceholder) { + EXPECT_EQ(ScrubErrorMessage("Load model from /home/alice/models/foo.onnx failed"), + "Load model from [path] failed"); + // The username must not survive. + EXPECT_EQ(ScrubErrorMessage("/home/alice/models/foo.onnx").find("alice"), std::string::npos); } -TEST(TelemetryRedactionTest, QuotedPath) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("file \"/home/alice/x/y.onnx\" missing"), - "file \"y.onnx\" missing"); +TEST(TelemetryRedactionTest, WindowsDriveAndUncReplaced) { + EXPECT_EQ(ScrubErrorMessage("Load C:\\Users\\bob\\m.onnx failed"), "Load [path] failed"); + EXPECT_EQ(ScrubErrorMessage("open D:/data/secret/model.onnx"), "open [path]"); + EXPECT_EQ(ScrubErrorMessage("from \\\\server\\share\\dir\\weights.bin done"), "from [path] done"); + EXPECT_EQ(ScrubErrorMessage("Load C:\\Users\\bob\\m.onnx failed").find("bob"), std::string::npos); } -TEST(TelemetryRedactionTest, MultiplePaths) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("copy /home/u/a.onnx to /opt/cache/a.onnx"), - "copy a.onnx to a.onnx"); -} - -TEST(TelemetryRedactionTest, BareRootsArePreserved) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("at / root"), "at / root"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("scheme-relative //host/x"), "scheme-relative //host/x"); -} - -TEST(TelemetryRedactionTest, TrailingPunctuationAfterPath) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("missing /home/alice/models/foo.onnx, retry"), - "missing foo.onnx, retry"); -} - -TEST(TelemetryRedactionTest, HomeDirectoryReducedToTilde) { - // A path that ends at the user's home directory must not emit the bare username. - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/home/alice"), "~"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/Users/alice"), "~"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Users\\bob"), "~"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("could not access /home/alice"), "could not access ~"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/root"), "~"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Users\\bob").find("bob"), std::string::npos); -} - -TEST(TelemetryRedactionTest, DoesNotOverRedactUnrelatedHomeUsersDirs) { - // A real file under a directory merely named home/users (not the first path component) is kept. - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/usr/home/config.txt"), "config.txt"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/opt/users/data.bin"), "data.bin"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/var/lib/users/cache.db"), "cache.db"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/home/alice/models/foo.onnx"), "foo.onnx"); +TEST(TelemetryRedactionTest, PathsWithSpacesDoNotLeakUsername) { + // Both halves of a spaced path contain a backslash, so each is replaced; no username leaks. + EXPECT_EQ(ScrubErrorMessage("Load C:\\Users\\First Last\\model.onnx failed"), + "Load [path] [path] failed"); + EXPECT_EQ(ScrubErrorMessage("Load C:\\Users\\First Last\\model.onnx failed").find("First"), + std::string::npos); } -TEST(TelemetryRedactionTest, PathsWithSpacesAreFullyReduced) { - // The username and directory layout are dropped even when the path contains spaces. - EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load C:\\Users\\First Last\\model.onnx failed"), - "Load model.onnx failed"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("C:\\Program Files\\foo\\bar.dll"), "bar.dll"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("/Users/bob/Library/Application Support/x/m.onnx"), - "m.onnx"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("Load C:\\Users\\First Last\\model.onnx failed").find("First"), - std::string::npos); +TEST(TelemetryRedactionTest, MultiSegmentRelativeAndUrlReplaced) { + // Matches onnxruntime-genai: a token with 2+ "/x" segments (incl. URLs) is treated as a path. + EXPECT_EQ(ScrubErrorMessage("a/b/c"), "[path]"); + EXPECT_EQ(ScrubErrorMessage("see https://example.com/a/b/c for details"), "see [path] for details"); + EXPECT_EQ(ScrubErrorMessage("input:/home/alice/secret/m.onnx"), "[path]"); + EXPECT_EQ(ScrubErrorMessage("file:///home/alice/secret/model.onnx"), "[path]"); + EXPECT_EQ(ScrubErrorMessage("~/.config/app/x"), "[path]"); } -TEST(TelemetryRedactionTest, PathsGluedToPunctuation) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("input:/home/alice/secret/m.onnx"), "input:m.onnx"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("paths /a/b/c.txt,/x/y/z.txt done"), - "paths c.txt,z.txt done"); +TEST(TelemetryRedactionTest, SingleSegmentAndNonPathSlashesKept) { + EXPECT_EQ(ScrubErrorMessage("models/foo.onnx"), "models/foo.onnx"); + EXPECT_EQ(ScrubErrorMessage("ratio 3/4 and and/or"), "ratio 3/4 and and/or"); } -TEST(TelemetryRedactionTest, FileUriRedactedButHttpPreserved) { - EXPECT_EQ(RedactAbsolutePathsForTelemetry("file:///home/alice/secret/model.onnx"), "file:model.onnx"); - EXPECT_EQ(RedactAbsolutePathsForTelemetry("see https://example.com/a/b for x"), - "see https://example.com/a/b for x"); +TEST(TelemetryRedactionTest, LengthIsCappedAfterScrub) { + const std::string long_msg(300, 'x'); + EXPECT_EQ(ScrubErrorMessage(long_msg).size(), kMaxTelemetryErrorMessageLength); + EXPECT_LE(ScrubErrorMessage("short").size(), kMaxTelemetryErrorMessageLength); } } // namespace test From 32400570e186fd406f6b1138fc5191b4a4330b26 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Fri, 26 Jun 2026 03:53:28 -0500 Subject: [PATCH 56/61] Address multi-agent review: robust telemetry env opt-out + honest IP-scrub note - ORT_TELEMETRY_ENABLED opt-out now lowercases the value and accepts the same set as onnxruntime-genai (0/false/off/no/disabled/n), so ORT_TELEMETRY_ENABLED=False (Python str(False)), =off, =no no longer silently leave telemetry on. As a privacy control an unrecognized value previously failed in the unsafe direction. Adds for std::tolower. - enableIpScrubbing is set only for parity with onnxruntime-genai; the bundled cpp_client_telemetry SDK does not consume this key (client-IP obfuscation is a OneCollector tenant-side setting), so the comment no longer claims an effect it does not deliver. Validated on Linux (gcc-13): onnxruntime_common compiles. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 05797313ff69e..e2e44f0c37e40 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -29,6 +29,7 @@ #include #endif +#include #include #include #include @@ -291,12 +292,16 @@ void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventPropert void PosixTelemetry::Initialize() { std::unique_lock lock(mutex_); - // Environment opt-out: ORT_TELEMETRY_ENABLED=0/false disables telemetry at runtime without - // recompiling and skips creating the 1DS uploader entirely. Mirrors onnxruntime-genai's - // ORTGENAI_TELEMETRY_ENABLED opt-out. + // Environment opt-out: ORT_TELEMETRY_ENABLED set to a disabled value (0/false/off/no/disabled/n, + // case-insensitive) disables telemetry at runtime without recompiling and skips creating the 1DS + // uploader entirely. Accepts the same value set as onnxruntime-genai's ORTGENAI_TELEMETRY_ENABLED. if (const char* env = std::getenv("ORT_TELEMETRY_ENABLED"); env != nullptr) { - const std::string value(env); - if (value == "0" || value == "false" || value == "FALSE") { + std::string value(env); + for (char& ch : value) { + ch = static_cast(std::tolower(static_cast(ch))); + } + if (value == "0" || value == "false" || value == "off" || value == "no" || + value == "disabled" || value == "n") { enabled_.store(false, std::memory_order_release); return; } @@ -314,7 +319,10 @@ void PosixTelemetry::Initialize() { auto& config = *config_; config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; - config["enableIpScrubbing"] = true; // collector-side client-IP obfuscation + // Set for parity with onnxruntime-genai. Client-IP obfuscation is enforced as a OneCollector + // tenant/server-side setting; the bundled cpp_client_telemetry SDK version does not consume this + // key, so the flag is an inert no-op kept only to mirror genai's configuration. + config["enableIpScrubbing"] = true; config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown From 063c52e0c273ebecae8e9e8cc5d0f0fd8c8ec91a Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Fri, 26 Jun 2026 10:40:29 -0500 Subject: [PATCH 57/61] telemetry: drop inert enableIpScrubbing flag; non-blocking teardown (genai parity) - Remove config['enableIpScrubbing']=true: the bundled cpp_client_telemetry SDK does not consume this key (client-IP obfuscation is a OneCollector tenant-side setting), so it was an inert no-op. IP scrubbing stays enabled by default server-side. - Set CFG_INT_MAX_TEARDOWN_TIME=0 (was 10) so Shutdown does not block process exit to upload; persisted events are sent on the next run. Matches onnxruntime-genai and avoids adding exit latency to host apps. Validated by an Android arm64 cross-compile (NDK r29 / Clang 21, FetchContent 1DS SDK): onnxruntime_common incl. posix/telemetry.cc compiles and links. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index e2e44f0c37e40..8bc301bfaaf67 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -319,13 +319,11 @@ void PosixTelemetry::Initialize() { auto& config = *config_; config[CFG_STR_COLLECTOR_URL] = "https://mobile.events.data.microsoft.com/OneCollector/1.0"; - // Set for parity with onnxruntime-genai. Client-IP obfuscation is enforced as a OneCollector - // tenant/server-side setting; the bundled cpp_client_telemetry SDK version does not consume this - // key, so the flag is an inert no-op kept only to mirror genai's configuration. - config["enableIpScrubbing"] = true; config[CFG_INT_TRACE_LEVEL_MASK] = 0; // Disable SDK internal logging config[CFG_INT_SDK_MODE] = SdkModeTypes::SdkModeTypes_CS; // Common Schema 4.0 mode - config[CFG_INT_MAX_TEARDOWN_TIME] = 10; // 10 seconds max for shutdown + // Do not block process teardown to upload; persisted events are sent on the next run. 0 keeps + // Shutdown non-blocking and avoids adding exit latency to host apps (matches onnxruntime-genai). + config[CFG_INT_MAX_TEARDOWN_TIME] = 0; // Configure cache for offline scenarios — use same directory as device ID storage { From 7d9d4e1d23b5149f2cf5eb5e0b93adb1a199fb33 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Fri, 26 Jun 2026 16:20:19 -0500 Subject: [PATCH 58/61] telemetry: unify the opt-out env var as ORT_TELEMETRY_DISABLED Replace ORT_TELEMETRY_ENABLED (which disabled on 0/false/...) with ORT_TELEMETRY_DISABLED, set to a truthy value (1/true/yes/on/y, case-insensitive) to disable telemetry. A single opt-out variable honored by both ONNX Runtime and onnxruntime-genai (which previously used ORTGENAI_TELEMETRY_ENABLED). Validated on Linux (gcc-13) and Android arm64 (NDK r29 / Clang 21): onnxruntime_common compiles. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- onnxruntime/core/platform/posix/telemetry.cc | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 8bc301bfaaf67..80b62cae6a30f 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -292,16 +292,15 @@ void PosixTelemetry::LogEventAsync(Microsoft::Applications::Events::EventPropert void PosixTelemetry::Initialize() { std::unique_lock lock(mutex_); - // Environment opt-out: ORT_TELEMETRY_ENABLED set to a disabled value (0/false/off/no/disabled/n, + // Environment opt-out: ORT_TELEMETRY_DISABLED set to a truthy value (1/true/yes/on/y, // case-insensitive) disables telemetry at runtime without recompiling and skips creating the 1DS - // uploader entirely. Accepts the same value set as onnxruntime-genai's ORTGENAI_TELEMETRY_ENABLED. - if (const char* env = std::getenv("ORT_TELEMETRY_ENABLED"); env != nullptr) { + // uploader entirely. A single opt-out variable honored by both ONNX Runtime and onnxruntime-genai. + if (const char* env = std::getenv("ORT_TELEMETRY_DISABLED"); env != nullptr) { std::string value(env); for (char& ch : value) { ch = static_cast(std::tolower(static_cast(ch))); } - if (value == "0" || value == "false" || value == "off" || value == "no" || - value == "disabled" || value == "n") { + if (value == "1" || value == "true" || value == "yes" || value == "on" || value == "y") { enabled_.store(false, std::memory_order_release); return; } From 9abeed9aea7cf111b575f33bb28fb966f2e01e09 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Fri, 26 Jun 2026 17:54:15 -0500 Subject: [PATCH 59/61] telemetry: use a named 1DS LogManager host; fix shared-SDK comment - Create the POSIX LogManager with an explicit factory host name (CreateLogManager('OnnxRuntime', true, ...)) instead of the default/unnamed host, matching onnxruntime-genai (which uses 'OnnxRuntimeGenAI'). With distinct named hosts, onnxruntime and onnxruntime-genai run independent LogManagers even when they share one libmat.so, and neither collides on the SDK's default host. - Correct the onnxruntime_TELEMETRY_SHARED_SDK comment: the shared libmat.so shares the SDK code/TLS-HTTP stack, not the LogManager. Each binary still runs its own named LogManager, so the option does not merge them into one. Validated on Linux (gcc-13): onnxruntime_common compiles. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cmake/CMakeLists.txt | 5 +++-- onnxruntime/core/platform/posix/telemetry.cc | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt index 0ba9b0fa3728d..8520b8a0b9432 100644 --- a/cmake/CMakeLists.txt +++ b/cmake/CMakeLists.txt @@ -150,8 +150,9 @@ set(onnxruntime_1DS_TENANT_TOKEN "" CACHE STRING "Override the compiled-in 1DS t # When building non-Windows telemetry, optionally build the 1DS SDK (cpp-client-telemetry) as a shared # library (libmat.so) instead of statically linking it into libonnxruntime. This lets several binaries # that link the same SDK (for example onnxruntime and onnxruntime-genai shipped together) share a single -# copy of the SDK and its transitive TLS/HTTP stack, paying its footprint once instead of per-binary, and -# avoids running two independent 1DS LogManager singletons in one process. The SDK's own dependencies +# copy of the SDK and its transitive TLS/HTTP stack, paying its footprint once instead of per-binary. Each +# binary still runs its own named 1DS LogManager (onnxruntime and onnxruntime-genai use distinct factory +# hosts), so the shared library shares only the SDK code, not telemetry state. The SDK's own dependencies # (OpenSSL/curl/sqlite3/zlib) stay static inside libmat.so so it remains self-contained. Requires the # vcpkg cpp-client-telemetry port. Off by default: a standalone onnxruntime is smaller/simpler fully static. cmake_dependent_option(onnxruntime_TELEMETRY_SHARED_SDK "Build the non-Windows 1DS telemetry SDK as a shared library so multiple binaries can share one copy" OFF "onnxruntime_USE_TELEMETRY;NOT WIN32" OFF) diff --git a/onnxruntime/core/platform/posix/telemetry.cc b/onnxruntime/core/platform/posix/telemetry.cc index 80b62cae6a30f..cb31c40ead60e 100644 --- a/onnxruntime/core/platform/posix/telemetry.cc +++ b/onnxruntime/core/platform/posix/telemetry.cc @@ -344,7 +344,7 @@ void PosixTelemetry::Initialize() { // Create log manager via LogManagerProvider (recommended for production use, // per LogManager_Creation_and_Lifecycle_Management.md). status_t status; - log_manager_ = LogManagerProvider::CreateLogManager(*config_, status); + log_manager_ = LogManagerProvider::CreateLogManager("OnnxRuntime", true, *config_, status); if (status != STATUS_SUCCESS || !log_manager_) { ORT_TELEMETRY_WARN("Failed to create telemetry LogManager, status: " << status); config_.reset(); From ea128d08fce8baff0dc80b6af82c89111de1a595 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Fri, 26 Jun 2026 18:02:46 -0500 Subject: [PATCH 60/61] docs: document the ORT_TELEMETRY_DISABLED runtime opt-out in Privacy.md Add a 'Disabling Telemetry' section covering the three opt-out paths (omit --use_telemetry at build, the ORT_TELEMETRY_DISABLED=1 runtime env var, and the on/off API), and update the 'only implemented for Windows' statement to note the optional cross-platform 1DS provider built with --use_telemetry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/Privacy.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/Privacy.md b/docs/Privacy.md index fcc8468b7fa9f..26d63eed039a5 100644 --- a/docs/Privacy.md +++ b/docs/Privacy.md @@ -11,7 +11,7 @@ No data collection is performed when using your private builds built from source ### Official Builds ONNX Runtime does not maintain any independent telemetry collection mechanisms outside of what is provided by the platforms it supports. However, where applicable, ONNX Runtime will take advantage of platform-supported telemetry systems to collect trace events with the goal of improving product quality. -Currently telemetry is only implemented for Windows builds and is turned **ON** by default in the official builds distributed in their respective package management repositories ([see here](../README.md#binaries)). This may be expanded to cover other platforms in the future. Data collection is implemented via 'Platform Telemetry' per vendor platform providers (see [telemetry.h](../onnxruntime/core/platform/telemetry.h)). +Telemetry is turned **ON** by default in the official Windows builds distributed in their respective package management repositories ([see here](../README.md#binaries)), where it is implemented with the platform ETW provider. Builds for other platforms can additionally be compiled with the cross-platform 1DS telemetry provider by configuring with `--use_telemetry`; this is **not** enabled in the default builds. Data collection is implemented via 'Platform Telemetry' per vendor platform providers (see [telemetry.h](../onnxruntime/core/platform/telemetry.h)). #### Technical Details The Windows provider uses the [TraceLogging](https://docs.microsoft.com/en-us/windows/win32/tracelogging/trace-logging-about) API for its implementation. This enables ONNX Runtime trace events to be collected by the operating system, and based on user consent, this data may be periodically sent to Microsoft servers following GDPR and privacy regulations for anonymity and data access controls. @@ -19,3 +19,11 @@ The Windows provider uses the [TraceLogging](https://docs.microsoft.com/en-us/wi Windows ML and onnxruntime C APIs allow Trace Logging to be turned on/off (see [API pages](../README.md#api-documentation) for details). For information on how to enable and disable telemetry, see [C API: Telemetry](./C_API.md#telemetry). There are equivalent APIs in the C#, Python, and Java language bindings as well. + +### Disabling Telemetry + +Telemetry can be disabled in any of these ways: + +- **Don't build it in.** Telemetry is only compiled when configuring with `--use_telemetry` (`onnxruntime_USE_TELEMETRY=OFF` is the default), so a build without that flag collects no data. +- **At runtime, via environment variable.** Set `ORT_TELEMETRY_DISABLED=1` (also accepts `true`/`yes`/`on`/`y`, case-insensitive) before ONNX Runtime initializes. On the non-Windows 1DS provider this prevents the telemetry uploader from being created. The same variable is also honored by ONNX Runtime GenAI. +- **At runtime, via the API.** The C API (and the C#, Python, and Java bindings) expose calls to turn telemetry on/off. On Windows, ETW is passive — events are only emitted when an external trace session is collecting. From d8301b79820de4f2fb93a7330280d188442ae3b3 Mon Sep 17 00:00:00 2001 From: Bhagirath Mehta Date: Fri, 26 Jun 2026 18:12:27 -0500 Subject: [PATCH 61/61] docs: replace the dangling C_API.md telemetry link in Privacy.md C_API.md no longer exists in the repo, so the [C API: Telemetry](./C_API.md#telemetry) reference was a broken link. Point it at the new in-doc 'Disabling Telemetry' section instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/Privacy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Privacy.md b/docs/Privacy.md index 26d63eed039a5..eca47bed82b9b 100644 --- a/docs/Privacy.md +++ b/docs/Privacy.md @@ -17,7 +17,7 @@ Telemetry is turned **ON** by default in the official Windows builds distributed The Windows provider uses the [TraceLogging](https://docs.microsoft.com/en-us/windows/win32/tracelogging/trace-logging-about) API for its implementation. This enables ONNX Runtime trace events to be collected by the operating system, and based on user consent, this data may be periodically sent to Microsoft servers following GDPR and privacy regulations for anonymity and data access controls. Windows ML and onnxruntime C APIs allow Trace Logging to be turned on/off (see [API pages](../README.md#api-documentation) for details). -For information on how to enable and disable telemetry, see [C API: Telemetry](./C_API.md#telemetry). +For the ways to disable telemetry, see the [Disabling Telemetry](#disabling-telemetry) section below. There are equivalent APIs in the C#, Python, and Java language bindings as well. ### Disabling Telemetry