Skip to content

XRT Application Capture and Replay#9828

Open
stsoe wants to merge 31 commits into
Xilinx:masterfrom
stsoe:capture
Open

XRT Application Capture and Replay#9828
stsoe wants to merge 31 commits into
Xilinx:masterfrom
stsoe:capture

Conversation

@stsoe
Copy link
Copy Markdown
Collaborator

@stsoe stsoe commented May 25, 2026

Problem solved by the commit

Applications using XRT often involve complex, multi-layered software stacks (VAIML, PyTorch, TensorFlow) that make debugging, performance analysis, and regression testing difficult. There was no mechanism to capture the low-level XRT API interactions and replay them independently of the application stack. This commit introduces a capture and replay system that records XRT execution at the frame level (xrt::run::start() or xrt::runlist::execute() boundaries) and enables replay through a standalone executable.

How problem was solved, alternative solutions (if any) and why they were rejected

The solution uses a frame-based capture approach with buffer invalidation tracking. Key design decisions:

Capture Strategy: Uses polymorphic wrapper classes (run_impl_debug, runlist_impl_debug) to intercept XRT API calls with minimal performance impact when disabled. Macro XRT_REPLAY_CAPTURE with XRT_UNLIKELY branch hint ensures near-zero overhead when capture is disabled (capture_frames=0).

Buffer Management: Implements BO invalidation - buffer data is only dumped when xrt::bo::sync(XCL_BO_SYNC_BO_TO_DEVICE) is called. Run objects track which buffers are valid, avoiding redundant dumps when the same data is used across multiple frames.

Artifact Deduplication: Uses FNV-1a hash to deduplicate xclbins, ELF files, and buffer data, preventing duplicate storage of identical artifacts.

Wait Synchronization: Captures xrt::run::wait() and xrt::runlist::wait() calls and associates them with the current active frame. Replay executes waits in the same order, correctly handling asynchronous execution.

Risks (if any) associated the changes in the commit

  1. Performance when enabled: Capture adds file I/O overhead proportional to buffer sync frequency. Measured impact is acceptable for debug/analysis workflows but should not be enabled in production.

  2. Thread safety: The frames singleton uses std::mutex to protect shared state. All capture functions are thread-safe, but heavy contention under high-frequency API calls from multiple threads could cause slowdown.

  3. Memory growth: Long-running captures with many unique ELF files will accumulate memory until process exit. Acceptable for typical debug sessions but could be problematic for extended captures.

  4. Breaking changes: None. Capture is disabled by default and requires explicit xrt.ini configuration.

What has been tested and how, request additional testing if necessary

Tested:

  • Basic capture/replay with single run per frame
  • Runlist capture/replay with multiple runs per frame
  • Buffer invalidation with repeated xrt::bo::sync() calls
  • ELF flow (programs) and xclbin flow
  • Artifact deduplication (same xclbin/buffer used multiple times)
  • Wait synchronization across multiple queued frames
  • Stream seeking fix for elf_ctor (istream constructor)

Recommended additional testing:

  • Stress test: 1000+ frames, 100+ runs per frame
  • Large buffers (>1GB) to validate mmap efficiency
  • Multi-threaded applications with concurrent API calls
  • Edge cases: run started before set_arg, runlist with no runs
  • Performance regression test with capture disabled (should be <1% overhead)
  • Validation: compare output buffers from original vs replayed execution

Documentation impact (if any)

New documentation added in src/runtime_src/core/common/runner/replay.md covering:

  • Quick start guide with xrt.ini configuration
  • Replay executable usage and command-line options
  • JSON schema specification
  • Use cases (debugging, performance analysis, regression testing)
  • Troubleshooting guide
  • Advanced topics (artifact deduplication, ELF vs xclbin flow, module caching)

Configuration changes in xrt.ini:

[Runtime]
capture_frames=<num>          # Number of frames to capture (0=disabled, default)
capture_output_dir=<path>     # Output directory (default: ./)
New executable: xrt-replay for replaying captured execution.

@stsoe
Copy link
Copy Markdown
Collaborator Author

stsoe commented May 25, 2026

@sonals . This is a monster PR, but I think safe since capture is non-intrusive and default disabled. I need to do a lot more testing, but feel like getting these changes off my hand for now. You may want to browse to the replay.md file in this PR, the md file is written by Claude. The code was written by me, but reviewed by Claude.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 39. Check the log or trigger a new build to see more.

Comment thread src/runtime_src/core/common/api/xrt_bo.cpp
Comment thread src/runtime_src/core/common/api/xrt_bo.cpp
Comment thread src/runtime_src/core/common/api/xrt_elf.cpp
Comment thread src/runtime_src/core/common/api/xrt_elf.cpp
Comment thread src/runtime_src/core/common/api/xrt_elf.cpp
Comment thread src/runtime_src/core/common/runner/capture.cpp
Comment thread src/runtime_src/core/common/runner/capture.cpp
Comment thread src/runtime_src/core/common/runner/capture.cpp Outdated
Comment thread src/runtime_src/core/common/runner/capture.cpp
Comment thread src/runtime_src/core/common/runner/capture.h
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread src/runtime_src/core/common/runner/capture.cpp Outdated
Comment thread src/runtime_src/core/common/runner/detail/capture_artifacts.h Outdated
Comment thread src/runtime_src/core/common/runner/detail/capture_artifacts.h Outdated
Comment thread src/runtime_src/core/common/runner/detail/capture_artifacts.h
Comment thread src/runtime_src/core/common/runner/detail/capture_artifacts.h Outdated
Comment thread src/runtime_src/core/common/runner/detail/streambuf.h Outdated
Comment thread src/runtime_src/core/common/runner/replay.cpp Outdated
Comment thread src/runtime_src/core/common/runner/replay.cpp Outdated
Comment thread src/runtime_src/core/common/runner/replay.cpp
Comment thread src/runtime_src/core/common/runner/replay.cpp
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread src/runtime_src/core/common/runner/capture.cpp Outdated
@stsoe stsoe force-pushed the capture branch 2 times, most recently from 578200c to 950a4d6 Compare May 25, 2026 04:48
@stsoe stsoe requested review from sonals and removed request for uday610 May 25, 2026 05:27
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread src/runtime_src/core/common/runner/capture.cpp Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread src/runtime_src/core/common/runner/capture.cpp Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
stsoe added 13 commits May 27, 2026 09:08
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
When capturing frames, artifacts used by captured objects must
be save to disk.

Add artifacts dumper class to dump data to a file, with
non-cryptographic checksum to prevent dumping same data twice.
Collisions would be bad, but are unlikely.

Adjust internal APIs for data mining.

Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
stsoe added 18 commits May 27, 2026 09:08
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
A frames is an array of frame obejcs, where a frame is an array
of run objects.  If a frame has multiple run objects it implies the
application is using an xrt::runlist.  The frame will be replayed
as an xrt::runlst.

Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren.Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Capture is tracking BOs at individual run level. If application uses
an xrt::runlist then each run in the runlist are captured separately
but reflected as a frame with multiple runs in the replay json.

When a frame is initialized, it should initialize each BO at most
once, so double check that identical BOs in a frame refer to same
captured data (file).

Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant