Add set_default_logger_callback Python API for custom ORT logging#28457
Add set_default_logger_callback Python API for custom ORT logging#28457Copilot wants to merge 8 commits into
Conversation
This implements the feature requested in the issue by: 1. Adding PythonCallbackSink class that wraps a Python callable and can be dynamically updated without recreating the LoggingManager singleton 2. Modifying CreateOrtEnv to install the PythonCallbackSink at init time 3. Exposing set_default_logger_callback() in the Python module 4. Exporting it in onnxruntime/__init__.py 5. Adding test_set_default_logger_callback to the test file Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/6698b9af-fdda-49b0-a406-f7fb340b39e2 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/6698b9af-fdda-49b0-a406-f7fb340b39e2 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
…backSink and singleton guard Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/6698b9af-fdda-49b0-a406-f7fb340b39e2 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
|
@tianleiwu Any news on the topic? :-) |
tianleiwu
left a comment
There was a problem hiding this comment.
Review summary
The approach is sound — installing a PythonCallbackSink at module import and swapping the callable in-place (rather than rebuilding the LoggingManager singleton) is the right design, and the comments explaining the destroy → recreate sequence are excellent. Test coverage is good too.
There are two high-priority correctness/thread-safety issues around the lifecycle of the stored py::object that can affect all Python users (not just callers of the new API), plus a few suggestions. Details inline.
High-Priority
g_user_logging_callbackdefaults to a null handle, notpy::none().is_none()isptr() == Py_None, which isfalsefor a default-constructed (null)py::object. Since the sink is installed at import time, every ORT log at/above WARNING emitted before the firstset_default_logger_callback()call takes the callback branch and invokes a null callable → null-pointer deref. Initialize the stored callback topy::none()(under the GIL) at sink creation.- Refcount manipulation without the GIL in
SendImpl. ORT logs from internal worker threads that don't hold the GIL.py::object cb = g_user_logging_callback;doesPy_INCREFbefore the GIL is acquired, andcb's destructor doesPy_DECREFafter thegil_scoped_acquirescope ends — both unsynchronized → data race. Hold the GIL around the copy → invoke → destroy sequence.
Suggestions
set_default_logger_callback(None)always rewrites the logger severity (default WARNING), silently undoing a priorset_default_logger_severity(0). Only update severity when a callback is provided, or document it.err_msgin the exception handler is built but unused.- The callback is stored in a file-scope global rather than a sink member; consider a member for clarity.
Nitpicks
onnxruntime_pybind_module_functions.his missing a trailing newline.set_default_logger_callbackimport in__init__.pyis out of alphabetical order (lint bot already flagged this block).
There was a problem hiding this comment.
Pull request overview
This PR adds a new Python API (onnxruntime.set_default_logger_callback) to allow Python callers to intercept ONNX Runtime log messages via a Python callable, integrating ORT logging with application logging without needing to rebuild sessions or manipulate file descriptors.
Changes:
- Add a
PythonCallbackSink(ISink) implementation in the Python bindings and exposeset_default_logger_callback(callback, severity=2). - Replace the default
LoggingManagerduring Python module initialization to route logs through the Python-aware sink. - Export the new API from
onnxruntime/__init__.pyand add a Python test covering basic usage and invalid inputs.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/python/onnxruntime_pybind_state.cc | Adds PythonCallbackSink and the set_default_logger_callback pybind entry point. |
| onnxruntime/python/onnxruntime_pybind_module.cc | Replaces the initial LoggingManager at import time to install the Python-aware sink. |
| onnxruntime/python/onnxruntime_pybind_module_functions.h | Declares the factory for creating/registering the Python callback sink. |
| onnxruntime/init.py | Re-exports set_default_logger_callback in the public Python namespace. |
| onnxruntime/test/python/onnxruntime_test_python.py | Adds a new unit test for the callback API surface and basic behavior. |
- Initialize the global callback to py::none() in the sink factory so the no-callback fast path is taken before any callback is set (a null handle is not None, so the previous code could invoke an empty callable). - Restructure PythonCallbackSink::SendImpl to acquire the GIL before copying or destroying the py::object, avoiding Python refcount mutation on non-Python worker threads. Lock order is GIL -> mutex consistently. - Only update default logger severity when installing a callback; resetting with None no longer silently overwrites a user-set severity. - Drop the unused err_msg in the exception handler. - Skip replacing the LoggingManager when a process-wide OrtEnv already existed (created outside Python) to avoid invalidating by existing sessions. - Assert the callback is actually invoked and validate record types in the test. - Reorder __init__.py import alphabetically; add trailing newline to header.
- Store the user logging callback as a raw PyObject* (intentionally leaked at shutdown) instead of a global py::object, so its reference is never DECREF'd during static destruction / module unload after interpreter finalization. - Do all INCREF/DECREF under the GIL; keep the refcount-free pre-check under the mutex alone and clarify the lock-order comment. - Restore the default (Warning) logger severity in the Python test so the Verbose level it sets does not leak into the rest of the suite. - Add trailing newline in the set_default_logger_callback docstring.
…est model I/O names - Extract InstallPythonCallbackLoggingSink() into the shared onnxruntime_pybind_state.cc so both the inference and training pybind entry points install the PythonCallbackSink. The training module's CreateOrtEnv() previously registered set_default_logger_callback (via the shared addGlobalMethods) but never installed the sink, so the API threw 'g_python_callback_sink != nullptr was false' on training builds. - Fix test_set_default_logger_callback to use mul_1.onnx's actual single input 'X' and output 'Y' instead of the nonexistent inputs X/Y and output Z, which caused 'Invalid input name: Y' on macOS/CPU builds.
Description
Exposes a new
onnxruntime.set_default_logger_callback(callback, severity=2)API that lets Python users route ORT log messages to any Python callable.New function signature:
Example:
Implementation:
onnxruntime_pybind_state.cc: AddsPythonCallbackSink— a thread-safeISinksubclass that wraps a swappable Python callable and falls back to the platform sink when none is set. Callback exceptions are caught and swallowed (falling back to platform sink) to prevent crashes propagating into ORT. Addsset_default_logger_callback()inaddGlobalMethodswhich updates the sink's callable in-place; noLoggingManagerreconstruction needed.onnxruntime_pybind_module.cc: ModifiesCreateOrtEnv()to replace the initialLoggingManagerwith one backed byPythonCallbackSinkat module import time via a safe two-phase replace (destroy → recreate), valid because no sessions or threads exist yet.onnxruntime_pybind_module_functions.h: DeclaresCreateAndRegisterPythonCallbackSink()factory so the module init can call it.onnxruntime/__init__.py: Exportsset_default_logger_callbackin the public namespace.onnxruntime_test_python.py: Addstest_set_default_logger_callbackcovering normal use,Nonereset, inference stability, boundary severities, invalid-input rejection, and raising-callback safety.Motivation and Context
The Python binding hard-codes
nullptras the ORT logging function at environment creation time, making it impossible to redirect ORT log output from Python. The C++ API already supports this viaOrtApi::CreateEnvWithCustomLogger. This change brings equivalent capability to the Python API so users can integrate ORT logging with their application logger without resorting to file-descriptor manipulation.