Windows: Implement WOW64/ARM64EC offline compiler backend#5502
Conversation
808e7d8 to
cbbd654
Compare
This allows mapping the code directly into memory for execution.
…ation This ensures that any code buffer memory owned by the MappedResource is invalidated before being deallocated.
The lazy code loading refactor replaced LoadData with the new LoadCache/EnableLoadedSection API but left the Windows path as TODOs. Implement the wiring: LoadAOTImages now calls LoadCache + RegisterMappedCodeBuffer for each mapped cache file, and HandleImageMap calls EnableLoadedSection (with nullptr thread since lazy mapping is not yet implemented on Windows).
A significant proportion of time with the prior approach was spent in the unbuffered write calls used once per code page/entrypoint; taking upwards of 15 seconds to serialize a 700MB P3R cache. Switching to an mmap approach reduces the CPU time there to less than a second, leaving I/O to dominate. A two-pass approach is used: 1. Calculate the required size for the cache file, issue a callback to frontend to map a cache file of that size. 2. Write cache data into the buffer provided by that callback.
Merges and sorts relocations from all provided threads before serializing, enabling multi-threaded offline code cache generation.
This reverts commit 44a6bfd.
Implements an offline JIT compiler backend that compiles x86 code blocks from a given code map into an ARM64 code cache on Windows. Separate binaries are built for WOW64 (32-bit) and ARM64EC (64-bit) targets.
cbbd654 to
4e78fc7
Compare
|
|
||
| // Generic information associated with an executable file. | ||
| struct ExecutableFileInfo { | ||
| ~ExecutableFileInfo(); |
There was a problem hiding this comment.
Could you figure out what's going on here and fix it properly instead? There can be no reason this change should be needed, and I had flagged this before already.
There was a problem hiding this comment.
This is a proper fix, it's required to enable the use of named init which is much nicer. It's not required in the sense I could do something else, but it would make the user code more ugly? And I don't really sae a point when this works fine
There was a problem hiding this comment.
This is a proper fix, it's required to enable the use of named init
Nope, why would using designated initializers require this?
There was a problem hiding this comment.
Because it deletes the default ctor that's required for designated initialisers. If you try to compile without it you'll see the error
neobrain
left a comment
There was a problem hiding this comment.
Was there a striking argument to make this a separate executable instead of incorporating it into the existing FEXOfflineCompiler? Looking at the diff it seems largely unnecessary to have it separate, when integrating it would more easily enable a future option for automatic cache generation.
I'm also not super thrilled about potentially maintaining the same logic twice, especially with subtle things like atomic cache replacement and stale file deletion going on etc.
| std::optional<FEX::Windows::InvalidationTracker> InvalidationTracker; | ||
| std::optional<FEX::Windows::ImageTracker> ImageTracker; | ||
| std::optional<FEX::Windows::OvercommitTracker> OvercommitTracker; |
|
|
||
| const auto CacheDir = std::filesystem::path(FEX::Config::GetCacheDirectory()); | ||
| const auto ReadyDir = CacheDir / "codemap" / "ready"; | ||
| const auto MetadataPath = ImageCacheDir / "metadata"; |
There was a problem hiding this comment.
Unused variable. Do you not have compiler warnings enabled?
It needs a bunch of windows specific things, executable loading is entirely different, and we need two separate binaries targeting different architectures. There's also divergence in steamstub using bcrypt. Arm64ec handling requires all sorts of new relocation stuff windows gives us for free, we don't need to deal with flock etc. there's probably more. Atomic cache replacement inherently is different on windows and Linux too so not sure point of unifying there. I genuinely don't have time to change the fundamental approach here or argue more in either direction. I personally think there's strong justification for doing it this way and it's literally less than 1000 lines, it's not like the windows frontend or invalidation tracker reuses FEX Linux, half of that would need to be reimplemented. In terms of paths forward, I won't be able to meaningfully contribute enough time to refactor this in such a direction regardless of the outcome of such a discussion. So this so can either go with this design or will be left for someone else to implement. Idm. But I do think that this works perfect rn so |
Seems reasonable enough to keep it as a separate binary to handle Windows specific things. |
FTR since you're saying your time constraints don't allow for any sort of iteration, I've integrated this PR into the existing FEXOfflineCompiler and wired up the rest of the code caching infra myself instead. There's actually decent overlap especially with future PE support on the Linux side, just running more end-to-end-testing to ensure it's on par with Linux before PRing this. |
this would have been nice to know before I spent my weekend cleaning this up. Presumably your PR is at complete parity with this implementation, including non-executable range handling, volatile metadata handling, arm64x handling, jit guard page handling, multithreading and steam DRM support? as would be strictly required for claiming of decent overlap, for one could not reasonably assess the overlap without full feature parity. |
Includes many prereq changes, but I think submitting this all at once will probably be easier for review.
This is able to reuse both the WOW64 and ARM64EC image loading APIs by building specialised binaries. This offline compiler is to be invoked once per main image exe, and will compile all prereq libraries under the correct config context into the tagrte codemap dir