XRT Application Capture and Replay#9828
Open
stsoe wants to merge 31 commits into
Open
Conversation
Collaborator
Author
|
@sonals . This is a monster PR, but I think safe since capture is non-intrusive and default disabled. I need to do a lot more testing, but feel like getting these changes off my hand for now. You may want to browse to the replay.md file in this PR, the md file is written by Claude. The code was written by me, but reviewed by Claude. |
578200c to
950a4d6
Compare
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
When capturing frames, artifacts used by captured objects must be save to disk. Add artifacts dumper class to dump data to a file, with non-cryptographic checksum to prevent dumping same data twice. Collisions would be bad, but are unlikely. Adjust internal APIs for data mining. Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
A frames is an array of frame obejcs, where a frame is an array of run objects. If a frame has multiple run objects it implies the application is using an xrt::runlist. The frame will be replayed as an xrt::runlst. Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren.Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Capture is tracking BOs at individual run level. If application uses an xrt::runlist then each run in the runlist are captured separately but reflected as a frame with multiple runs in the replay json. When a frame is initialized, it should initialize each BO at most once, so double check that identical BOs in a frame refer to same captured data (file). Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem solved by the commit
Applications using XRT often involve complex, multi-layered software stacks (VAIML, PyTorch, TensorFlow) that make debugging, performance analysis, and regression testing difficult. There was no mechanism to capture the low-level XRT API interactions and replay them independently of the application stack. This commit introduces a capture and replay system that records XRT execution at the frame level (xrt::run::start() or xrt::runlist::execute() boundaries) and enables replay through a standalone executable.
How problem was solved, alternative solutions (if any) and why they were rejected
The solution uses a frame-based capture approach with buffer invalidation tracking. Key design decisions:
Capture Strategy: Uses polymorphic wrapper classes (run_impl_debug, runlist_impl_debug) to intercept XRT API calls with minimal performance impact when disabled. Macro
XRT_REPLAY_CAPTUREwithXRT_UNLIKELYbranch hint ensures near-zero overhead when capture is disabled (capture_frames=0).Buffer Management: Implements BO invalidation - buffer data is only dumped when xrt::bo::sync(XCL_BO_SYNC_BO_TO_DEVICE) is called. Run objects track which buffers are valid, avoiding redundant dumps when the same data is used across multiple frames.
Artifact Deduplication: Uses FNV-1a hash to deduplicate xclbins, ELF files, and buffer data, preventing duplicate storage of identical artifacts.
Wait Synchronization: Captures xrt::run::wait() and xrt::runlist::wait() calls and associates them with the current active frame. Replay executes waits in the same order, correctly handling asynchronous execution.
Risks (if any) associated the changes in the commit
Performance when enabled: Capture adds file I/O overhead proportional to buffer sync frequency. Measured impact is acceptable for debug/analysis workflows but should not be enabled in production.
Thread safety: The frames singleton uses std::mutex to protect shared state. All capture functions are thread-safe, but heavy contention under high-frequency API calls from multiple threads could cause slowdown.
Memory growth: Long-running captures with many unique ELF files will accumulate memory until process exit. Acceptable for typical debug sessions but could be problematic for extended captures.
Breaking changes: None. Capture is disabled by default and requires explicit xrt.ini configuration.
What has been tested and how, request additional testing if necessary
Tested:
Recommended additional testing:
Documentation impact (if any)
New documentation added in src/runtime_src/core/common/runner/replay.md covering:
Configuration changes in xrt.ini: