Skip to content

Native Windows (MSVC) support on develop / LLVM 20.1.7#823

Open
avijitbhuin21 wants to merge 27 commits into
exaloop:developfrom
avijitbhuin21:develop
Open

Native Windows (MSVC) support on develop / LLVM 20.1.7#823
avijitbhuin21 wants to merge 27 commits into
exaloop:developfrom
avijitbhuin21:develop

Conversation

@avijitbhuin21

Copy link
Copy Markdown

Native Windows (MSVC) support — Codon on develop / LLVM 20.1.7

Adds first-class native Windows support to Codon, rebased onto upstream
develop (v0.19.6) with the exaloop LLVM 20.1.7 fork. All changes are
_WIN32 / isWinMSVC()-guarded so Linux/macOS are untouched.

Internal review PR — base is our fork's develop, not exaloop/codon.
The upstream PR is held pending go-ahead (and the CLA).

Status

  • CI green on 4 platforms: Linux + macOS arm64 + macOS x86_64 + Windows MSVC.
  • Tests: CIRCoreTest 84/84, CoreTests/SeqTest 17/17 (debug + release),
    CNumericsTests 4/4 — full regression clean.
  • Features on Windows: codon run (JIT) + codon build (AOT), WinEH
    try/except/finally, OpenMP @par, numpy core + linalg (Highway + OpenBLAS),
    Jupyter kernel, rich AOT PDB backtraces, compiler-rt float16/128-bit builtins.

What's new in this batch (6 commits)

Commit Fix
reaching-defs LLP64 1UL shift base is 32-bit on Win64 → corrupted reaching-defs in fns with >32 assigns; use uint64_t(1). Fixes arithmetic.
JIT/WinEH correctness finally double-exec on propagate-through; out-of-range .xdata IMGREL to seq_exc_filter under ASLR (in-module thunk + 64-bit ptr); deterministic in-window JIT slab placement. Fixes exceptions, bltin.
__windows__ + file iter new compile-time constant; File._iter reads via _C.fgetc (cross-CRT FILE* handoff faulted).
math.frexp guard MSVC frexp(±inf)→nan; match CPython. Fixes numerics.
getline + zlib/bz2 exports export for the JIT symbol resolver. Fixes serialization, bltin.
CLI driver into codonc unify the llvm::cl registry so codon flags (-release, --numerics, …) register on Windows.

Out of scope / follow-ups

Jupyter packaging + dedicated CI job; JIT __ImageBase anchor cleanup
(code-health only — the ASLR hole is closed); CLA before any upstream PR.

avijitbhuin21 and others added 27 commits June 4, 2026 19:16
Lean workflow_dispatch pipeline that builds upstream develop
(0.19.6 / LLVM 20.1.7) for linux-x86_64 + darwin-arm64 + darwin-x86_64
using upstream's own Docker/prebuilt-LLVM path, with zero Windows
changes. Baselines the harness before the native-Windows transplant.
workflow_dispatch alone needs the workflow on the default branch; a push
trigger fires from the pushed ref so the baseline runs on this branch.
…numpy (R2)

Re-apply the native-Windows MSVC build-system port onto upstream develop
(0.19.6 / LLVM 20.1.7), all WIN32-guarded so Linux/macOS are unaffected:

cmake/deps.cmake:
- zlib-ng HAVE_OFF64_T OFF on Windows
- guard xz/xzdec EXCLUDE_FROM_ALL with if(TARGET)
- Windows bdwgc variant: enable_parallel_mark OFF (avoids loader-lock
  deadlock during codonrt.dll init), disable_handle_fork, single-obj,
  GC_BUILTIN_ATOMIC (no libatomic_ops)
- skip libbacktrace sources on Windows (re-enabled later, R4b)
- carve out OpenBLAS + enable_language(Fortran) on Windows (numpy deferred)

CMakeLists.txt:
- clang DLL CRT link flags (-fms-runtime-lib=dll + CRT import libs), GC_NOT_DLL
- WINDOWS_EXPORT_ALL_SYMBOLS on codonrt/codonc/codon_jupyter; CODON_COMPILER_BUILD
- remove numpy runtime sources (sort/loops/zmath) from codonrt on Windows
- /WHOLEARCHIVE static-dep linking; drop omp/backtrace from codonrt link
- bypass mandatory CODON_SYSTEM_LIBRARIES FATAL_ERROR on Windows
- drop Unix `find -exec rm` from headers target; Windows libs-copy via TARGET_FILE
- link ${LLVM_LIBS} into codon.exe; skip libomp/gfortran installs; exclude codon_test

numpy on Windows is deferred (documented follow-up); OpenMP + libbacktrace
re-enablement tracked separately. Verified: cmake configure succeeds on
Windows/MSVC against LLVM 20.1.7.
…ilds (R3)

Re-apply the runtime Windows port onto develop and handle new Unix-isms that
upstream added since 0.16.3 (so beyond the original reference patch):

exc.cpp: SEH throw/filter/TLS path (RaiseException + seq_exc_current/
  seq_exc_filter), Itanium personality machinery under #ifndef _WIN32, Windows
  windows.h/NOMINMAX/CODON_SEH_CODE block, no-op libbacktrace stubs.
lib.cpp: guard <unistd.h> and the newly-added <dlfcn.h>; NOMINMAX before gc.h
  (its windows.h was leaking the max macro into fast_float); localtime_r/
  gmtime_r -> localtime_s/gmtime_s; environ -> _environ; __kmpc_set_gc_callbacks
  empty stub; GC_remove_roots is absent on Win32 bdwgc (#if !MSWIN32) so pass
  nullptr in seq_init and make seq_gc_remove_roots a no-op on Windows.
common.cpp: PATH_MAX fallback + replace '\' -> '.' in module names.
capture.cpp: disambiguating (const Func*) cast. error.h: CODON_API dllexport on
  the four ErrorInfo `ID` data members (WINDOWS_EXPORT_ALL_SYMBOLS skips data).
CMakeLists: link zlibstatic normally (its embedded .res breaks lld /WHOLEARCHIVE).

Verified: codonrt.dll + codonrt.lib build and link on Windows/MSVC + LLVM 20.1.7.
Transplant the compiler-core Windows port onto develop and fix LLVM-20 +
develop-drift compile breaks so the whole compiler builds & links on MSVC:

memory_manager.cpp/.h: near-image JIT allocator + Win64SEHRegistrationPlugin
  (JITLink .pdata RtlAddFunctionTable registration) — compiles clean vs LLVM 20
  ORC/JITLink (the dominant API-break risk; survived 17->20).
engine.cpp/.h: defineImageBase/__ImageBase + SEH plugin wiring re-derived for
  develop's new LLJITBuilder-based Engine (was a manual ExecutionSession setup).
llvisitor.cpp: guard Unix includes (sys/wait.h/unistd.h) behind _WIN32 (process.h/
  windows.h); executeCommand _spawnvp branch; executable_path().string().

develop-drift Windows fixes (new since 0.16.3, so beyond the reference patch):
  base.h: global id_t on Windows (POSIX supplies one Linux-side; std::hash uses it
    unqualified). jit.cpp: strndup shim. Pervasive std::filesystem::path -> string:
    on Windows path uses wchar_t so no implicit std::string conversion — add
    explicit .string() across common.cpp/plugins.cpp/compiler.cpp/doc.cpp/
    translate.cpp. common.cpp: library_path() Windows branch (GetModuleFileNameA),
    get_absolute_path() _fullpath.

Verified: codonc.dll (50MB) + codon.exe (11MB) build and link on Windows/MSVC +
LLVM 20.1.7. EH/ABI runtime-codegen transplant in llvisitor still pending (R5).
…unclet EH

Transplant the llvisitor codegen Windows port onto develop/LLVM 20:
- isWinMSVC()/win64IsIndirect() helpers; makePersonalityFunc -> __C_specific_handler.
- run() LLJIT: addWin64SEHRegistration(*L); CodeModel::Large; define __ImageBase
  absolute symbol; skip the GDB JIT-debug registrar on Windows (needs an ORC-runtime
  symbol absent there).
- makeLLVMFunction + visit(CallInstr): Win64 struct ABI (>8B aggregates -> sret/byval)
  so @C calls match the MSVC-compiled runtime DLL.
- visit(Module) proxy-main + visit(TryCatchFlow): catchswitch/catchpad[seq_exc_filter]
  funclet EH (recover TLS exc via seq_exc_current, catchret to normal type-id dispatch);
  unwindResume -> seq_throw re-raise (resume invalid under WinEH). LLVM-20: getInt8PtrTy
  -> getPtrTy.

Critical develop-drift fix: Windows source paths embedded in mangled function names had
backslashes, which LLVM-IR quoted-identifier parsing unescapes (\ -> \), mismatching the
post-link getFunction lookup ("function not linked in"). Normalize paths feeding names to
forward slashes via generic_string() (compiler.cpp/common.cpp/translate.cpp/doc.cpp);
identical to .string() on Linux.

VERIFIED on native Windows + LLVM 20.1.7: `codon run hello.py` and `codon run fib.py`
(recursion, f-strings, str(int) via sret, dict/list/float) both exit 0 with correct output.
try/except JIT EH (exc.py) still crashing — WinEH funclet/.pdata unwinding WIP.
writeToExecutable: on Windows use `clang -fuse-ld=lld`, drop Unix-only link args
(-lomp/-lpthread/-ldl/-lz/-lm/-lc -> just -lcodonrt), and guard -Wl,-rpath / -no-pie
behind #ifndef _WIN32. -lcodonrt resolves codonrt.lib via the install rpath
(../lib/codon).

VERIFIED (install tree, LLVM 20.1.7): `codon build hello.py`/`fib.py` produce .exe that
run correct + exit 0. So both `codon run` (JIT) AND `codon build` (AOT) work end-to-end
on native Windows for non-exception programs.

Known issue: exc.py (try/except) crashes in writeToObjectFile (LLVM WinEHPrepare/X86
codegen) under BOTH JIT and AOT -> the visit(TryCatchFlow) funclet IR is the culprit
(proxy-main catchswitch is fine since hello/fib codegen ok). Needs funclet-IR debugging.
… crash)

The finally-routing exception block (tc.finallyExceptionBlock) was still emitting an
Itanium landingpad while being used as an invoke unwind target — invalid under WinEH
(every unwind target must begin with an EH pad), which crashed LLVM-20 WinEHPrepare/X86
codegen during writeToObjectFile for any try with a finally. Convert it to the same
catchswitch/catchpad[seq_exc_filter] funclet form as the main catch pad: recover the TLS
exception, set rethrow state, catchret into the (normal-code) finally block.

Result: exc.py now CODEGENS without crashing (both JIT and AOT). Remaining: the raised
exception escapes the inner try/except to the proxy-main catch-all under BOTH JIT and AOT
(identical) -> a WinEH SEH catchpad-filter/dispatch semantic that differs on LLVM 20
(NOT .pdata registration, since AOT auto-registers and fails the same). hello/fib still
run correctly JIT + AOT.
…AOT)

The Itanium personality sets the landingpad selector to the type-id of the first
catch clause the thrown object is-an-instance-of; Codon's dispatch switches on it.
The WinEH funclet path had no personality computing that selector and hardcoded it
to 0, so every typed `except` fell through to rethrow (the exception reached the
proxy-main catch-all -> terminate).

Fix: new runtime seq_exc_match(unwindExc, ids[], n) reproduces the personality's
isinstance walk (RTTIObject->type->{id,parent_ids}) over the clause type-ids and
returns the first match. The catch funclet emits a constant array of the clause
type-ids, calls seq_exc_match in normal code after catchret, and stores the result
as the pad's index-1 selector so the existing type-id switch matches.

VERIFIED on native Windows + LLVM 20.1.7 — hello.py, fib.py, exc.py (try/except/
finally) ALL run correct + exit 0 under BOTH `codon run` (JIT) AND `codon build`
(AOT). The Win64 SEH/funclet EH + ABI milestone is re-proven on 0.19.6/LLVM 20.
Adds a windows-latest job that builds the exaloop LLVM fork tag codon-20.1.7 from
source (X86, RTTI ON, tools OFF, MSVC triple; cached key windows-msvc-llvm-20-1-7),
builds codon with clang/lld, packages DLLs into bin, and smoke-tests codon run +
codon build (incl. try/except). Linux/macOS jobs stay as regression guards that the
WIN32-guarded port didn't touch other platforms.
Replace the no-op Windows backtrace stubs with a real implementation built on
dbghelp (CaptureStackBackTrace + SymFromAddr + SymGetLineFromAddr64), exposing
libbacktrace's exact API contract (backtrace_create_state/full/simple) so the rest
of the runtime is unchanged. dbghelp reads CodeView/PDB info — the right choice for
MSVC/clang-link output, vs libbacktrace's DWARF-on-ELF assumptions. Link dbghelp.

Verified: codonrt links with dbghelp; exc.py still passes; uncaught exceptions report
correctly. (Rich symbolized native frames additionally need PDB emission in the AOT
link driver — a follow-up; codon's own "Raised from" header already gives source loc.)
All WIN32-guarded; Linux/macOS unchanged.

- gtest codon_test: statically links the compiler sources into the test (MSVC
  can't auto-import codonc.dll's static data members), kept EXCLUDE_FROM_ALL.
- OpenMP @Par: imported `omp` target (GC-patched libomp installed alongside
  LLVM); codonrt links it + copies/installs libomp.dll.
- numpy core: un-carve runtime/numpy/{sort,loops,zmath}.cpp (Highway, no Fortran);
  link hwy/hwy_contrib whole-archived.
- numpy.linalg: deps.cmake downloads the OpenBLAS project's self-contained prebuilt
  Windows binary; imported `openblas` target; codonrt links it (+/INCLUDE: to load
  the DLL for JIT) and installs libopenblas.{dll,lib}.
seq_init now wires the real __kmpc_set_gc_callbacks from the GC-patched libomp
(was a no-op stub on Windows). Registers OpenMP worker threads with bdwgc so
parallel allocation is safe. del_roots stays nullptr on Win32 (GC_remove_roots
absent); the libomp patch null-checks it.
All WIN32-guarded.

- AOT symbolized backtraces: emit the CodeView module flag on Windows targets and,
  with -g, pass /debug + /pdb to lld so the dbghelp runtime backtrace can symbolize
  C-level frames (function names + file:line).
- AOT link driver: add -llibomp (@Par) and -llibopenblas (numpy.linalg), resolved
  via the existing -L../lib/codon search path.
- JIT near-image fix: anchor __ImageBase 3.5GB below __C_specific_handler
  (handler - 0xE0000000) instead of its 4GB-aligned floor. The narrow [floor,handler]
  window got congested once libomp loaded, so the slab fell back below the floor and
  the .xdata Pointer32NB relocs underflowed -- this had started breaking codon run for
  EVERY program. Offset kept in sync across allocateNearImage, the SEH .pdata
  registration plugin, and the __ImageBase symbol in both JIT paths.
On Windows each SeqTest case re-execs the test binary in a --run-case child (params
via temp file, stdout over a pipe) instead of fork()+pipe()+waitpid(), preserving
per-case isolation. CRLF normalized in output/expect comparison. <windows.h> included
after the LLVM headers with NOMINMAX/NOGDI. NumPy/GPU suites skipped on Windows.
Builds the xeus/xeus-zmq/libzmq stack on MSVC (OpenSSL+sodium via vcpkg, all deps
forced to /MD, ZMQ_STATIC). Patches: xeus.patch CRLF->LF; xeus-win.patch uses the
Win32 GUID backend (ole32/rpcrt4) instead of libuuid for clang-on-Windows;
xeus-zmq.patch skips the spurious libsodium requirement on Windows. Also fixes
jupyter.cpp's drift vs the 0.19.6 API (JIT from Options, options->capture, getErrors)
and links fmt+LLVM directly (codonc.dll doesn't re-export them on MSVC).
autocrlf would CRLF-ify the jupyter xeus patches on checkout, making git apply
fail with 'corrupt patch' during the plugin build on Windows.
Windows job: build the GC-patched libomp standalone with clang-cl and install it
into the cached LLVM tree (enables @Par); package libomp.dll + libopenblas.dll; and
extend the smoke test to cover @Par and numpy core+linalg on both JIT and AOT.
OpenBLAS needs no CI step -- deps.cmake downloads the prebuilt.
clang emits compiler-rt builtins for float16/bf16 conversions (__truncdfhf2 etc.)
and 128-bit ints (Int[128]/UInt[128] div/mod + int<->fp: __divti3/__modti3/
__floattidf/...). On Linux/macOS these come from libgcc/compiler-rt; on Windows the
MSVC CRT lacks them, so programs using float16 or Int[128] failed at JIT/AOT with
'Symbols not found'.

Vendor them (Apache-2.0-with-LLVM-exception, like the existing floatlib) and compile
them as codonrt's OWN sources on Windows -- NOT via whole-archived codonfloat, whose
symbols WINDOWS_EXPORT_ALL_SYMBOLS does not export. As codonrt's own objects they're
exported, so the JIT resolves them through the process symbol table and AOT via
codonrt's import lib. clang supports __int128 even on windows-msvc. Verified: Int[128]
div/mod/float and float16 ops work (JIT + AOT); hello/numpy/linalg/@Par unregressed.
Build codon_test (EXCLUDE_FROM_ALL on Windows) and run the fast in-process IR suite
(CIRCoreTest, 84 cases) as a regression gate. SeqTests are left out -- slow, and they
surface separate Windows runtime correctness gaps.
`BitSet` stores 64-bit words but masked bits with `1UL << (bit % 64)`.
`unsigned long` is 32-bit under LLP64 (Win64), so for `bit % 64 >= 32` the
shift is UB and bits 32..63 alias onto 0..31 -- silently corrupting
reaching-definitions in any function with >32 tracked assignments. This
mis-folded the post-loop `saw_digit` guard in `_parse_int`, making
`int(str)` / `Int[N](str)` raise spuriously (the `arithmetic` SeqTest).

Use `uint64_t(1)` as the shift base in set()/get(). No-op on LP64
(Linux/macOS), correct on every data model.
Three Windows-only exception-handling bugs surfaced by the SeqTest suite
(`exceptions`, `bltin`), all behind `isWinMSVC()`/`_WIN32`:

1. `finally` double-execution on propagate-through (llvisitor.cpp,
   visit(TryCatchFlow)). The `unwindResumeBlock` re-raise (`call(seq_throw)`,
   since `resume` is illegal under WinEH) ran while this tc's *finally* entry
   was still on the finally-stack, so call() routed the re-raise back into the
   tc's own finally pad and the body ran twice. Temporarily pop the finally
   entry around the re-raise so it propagates to the enclosing handler
   (matches rethrowBlock). The Itanium path uses `resume` and is unaffected.

2. Out-of-range `.xdata` IMGREL to `seq_exc_filter` under ASLR. The catchpad
   scope table emits the filter as a Pointer32NB reloc (filter - anchor), but
   `seq_exc_filter` lives in codonrt.dll, which ASLR can load >4GB from the
   anchor -> the 32-bit fixup overflows and JIT materialization aborts. New
   `makeWinEHFilter()` emits an in-module thunk (compiled into the in-window
   JIT slab) that reaches the real far filter through a 64-bit pointer; the 3
   catchpad sites use it. AOT resolves the pointer to the in-image import thunk.

3. JIT near-image slab placement (memory_manager.cpp, allocateNearImage).
   `llvm::sys::Memory`'s NearBlock hint silently fell back to an unconstrained
   mapping for large/late modules, landing outside the 4GB window and
   overflowing ADDR32NB relocs. Replaced with a deterministic VirtualQuery /
   VirtualAlloc scan that commits an exact in-window base and reuses freed slots.
Add a `__windows__` compile-time constant (mirrors `__apple__`) in
Compiler::getEarlyDefines() and the doc visitor, so stdlib can branch on the
platform.

Use it to fix a crash in `File._iter` / `_iter_trim_newline`: those delegated
to codonrt's `getline`, but in the JIT the `_C.fopen` FILE* and codonrt's
`fgetc` resolve to different CRT FILE pools, so handing the FILE* across faults
(`RtlEnterCriticalSection`). On Windows, read lines via `_C.fgetc` into a GC
`List[u8]` (same CRT as the rest of File's `_C.*` calls). Linux/macOS keep the
native `getline` path unchanged.
MSVC's `frexp(±inf)` returns nan for the mantissa (and an unspecified
exponent); glibc already returns the CPython-contract values. Add a
non-finite guard returning `(x, 0)` so `frexp(±inf)=(±inf,0)` and
`frexp(nan)=(nan,0)` on every platform (the `numerics` SeqTest). No-op on
glibc.
Two parts, both Windows-only:

- lib.cpp: provide an exported `getline()` (absent from the MSVC CRT) that
  codon's stdlib (file.codon, input) calls via `_C.getline`, so both AOT and
  the JIT's process symbol resolver find it.

- CMakeLists.txt: WINDOWS_EXPORT_ALL_SYMBOLS only scans codonrt's own objects,
  so the zlib `gz*` and bzip2 `BZ2_*` members of the static libs are not in
  codonrt.dll's export table and the JIT resolver can't find them. Force a
  reference (/INCLUDE, to pull in the archive member) and /EXPORT each. Fixes
  the `serialization` and `bltin` SeqTests.
On Windows, LLVM is statically linked into both codon.exe and codonc.dll, so
each owns a separate `llvm::cl` GlobalParser registry. The flag globals
(options.cpp, numpy.cpp's npfuse-*) live in codonc, but main.cpp parsed in the
exe's registry -> every codon-specific flag (-release, --numerics, -fast-math,
-disable-exceptions, npfuse-*) was rejected as "Unknown command line argument".
The npfuse-* flags are read inside codonc, so moving flags to the exe could not
fix it -- the registry had to be unified.

Move the whole compile driver (run/build/doc/jit + processSource + every
ParseCommandLineOptions call) out of main.cpp into a new codon/app/cli.cpp
compiled into codonc, exported as `codon::cliMain()` (codon/app/cli.h).
main.cpp is now a thin shim: it handles `jupyter` (depends on codon_jupyter,
not codonc flags) and forwards everything else. Not _WIN32-guarded -- a uniform
app/lib boundary change, harmless on Linux/macOS (single registry there
regardless).
Native Windows (MSVC) support on develop / LLVM 20.1.7
@cla-bot

cla-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

Thank you for your pull request. We require contributors to agree to our Contributor License Agreement (https://exaloop.io/legal/cla), and we don't have @avijitbhuin21 on file. In order for us to review and merge your code, please email info@exaloop.io to get yourself added.

@arshajii

Copy link
Copy Markdown
Contributor

@cla-bot check

@cla-bot cla-bot Bot added the cla-signed label Jun 12, 2026
@cla-bot

cla-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

The cla-bot has been summoned, and re-checked this pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants