[CI] Cross-platform — Part 3: Windows workflow#5700
Conversation
There was a problem hiding this comment.
Review Summary
This PR adds experimental Windows CI jobs for perception smoke testing, building on the uv-based installation path. The approach is well-structured, with incremental jobs that progressively test more functionality.
✅ Strengths
- Incremental approach: The workflow builds from simple torch/scipy tests → IsaacLab core install → full perception smoke
- Cross-platform fix in
AssetConverterBase: Moving from hardcoded/tmp/IsaacLabtotempfile.gettempdir()is the correct POSIX/Windows-agnostic solution - Proper marker recognition in AppLauncher: The
windows_ciandarm_cimarkers are now correctly filtered fromsys.argv - Good use of
continue-on-error: true: Appropriate for experimental jobs that should not gate merges
Architecture
- ✅ Clean separation of reusable GitHub Actions (
windows-instance-state,windows-sim-paths) from the main workflow - ✅ Consolidated single-job design prevents runner teardown between steps — smart approach for autoscaling runners
- ✅ OS-level watchdogs via
Start-Process + WaitForExitinstead of relying on GIL-vulnerable Python thread watchdogs - ✅ Develop-aligned Isaac Sim builds from internal Artifactory index (authenticated)
- ✅ Perception and training smoke tests enabled (WDDM mode confirmed)
⚠️ Optional Suggestions (Non-blocking)
run_docker_tests: 'false'inbuild.yaml: The comment says "TEMP: disable heavy Linux Docker + Tests" — should be reverted before merge if not intentional.- Hardcoded Git Bash Path:
C:\Program Files\Git\bin\bash.exeassumes standard installation. Consider fallback or documentation.
Update (ad92f43): Reviewed incremental diff from bae6977...ad92f43 (rebase on develop + Windows CI refinements):
Windows CI Changes:
- ✅ Perception test fully re-enabled: Both the standalone
test-perceptionstep and the perception subcase in training smoke now run. Perception added to$blockingin aggregate. - ✅ Develop-aligned Isaac Sim install: Setup step now resolves and installs the develop-branch-aligned Isaac Sim build from the internal Artifactory index (authenticated via
ISAACSIM_ARTIFACTORY_READONLY_*secrets). This replaces the previous public pip install and ensures CI tests against the same Sim as the Linux/ARM develop container. - ✅ Wheel install adjusted: The
[all]extra (which pinsisaacsim==5.1.0) is no longer used during wheel validation, preventing a downgrade of the develop Sim build. - ✅ Training smoke expanded: Step renamed to "state + perception" with increased timeout (25 min);
-k "not perception"filter removed.
Non-CI changes (develop rebase): The diff includes a large number of unrelated develop-branch changes (cloner refactoring, observation pipeline improvements, new tutorials, bugfixes, etc.) that are not part of this PR's Windows CI scope. No new issues found.
Approval stands.
Foundation for cross-platform CI. Registers four pytest markers (windows, windows_ci, arm, arm_ci), teaches AppLauncher to recognize them in argv so they do not leak into Isaac Sim's argparse, and moves the AssetConverterBase USD scratch directory from a hardcoded /tmp/IsaacLab to tempfile.gettempdir() for cross-platform compatibility. Tags source/isaaclab/test/deps/test_torch.py and test_scipy.py with the new markers so they are selectable by future cross-platform jobs. Workflow files (arm-ci.yaml, windows-ci.yaml) ship in follow-up PRs.
d9d7dac to
d25f9bb
Compare
Same shape as arm-ci.yaml but the install path is native pip + uv on the Windows host (no Docker for Linux-based Isaac Sim wheels). Jobs (all continue-on-error: true): Tier 1 — general-windows, install-windows, kit-launch-windows Tier 2 — path-io-windows, perception-windows Every pytest invocation passes --timeout=N + --timeout-method=thread (signal is unavailable on Windows) plus --continue-on-collection-errors so a hung test cannot consume the full job slot and a broken neighbor file does not poison the marker-driven discovery. perception-windows wraps the cartpole-camera smoke in an inline Python script with explicit assertions and an inner watchdog thread that aborts the process after 180s. This replaces the previous pattern where Vulkan init failures hung the job instead of erroring. Tags four path-IO test files (test_configclass, test_dict, test_episode_data, test_hdf5_dataset_file_handler) with the windows_ci marker so path-io-windows picks them up via marker-driven discovery.
d25f9bb to
6811028
Compare
Forces run_docker_tests=false in build.yaml's changes job so all gated test jobs skip via their existing if-gate. Must be reverted before final review.
Kit bootstrap aborts on the Windows runner with 'Unable to bootstrap inner kit kernel: EOF when reading a line' when stdin is not a tty and no EULA env vars are set. Set OMNI_KIT_ACCEPT_EULA / ACCEPT_EULA / PRIVACY_CONSENT at the workflow level so every job inherits them.
Bare 'isaacsim' on Windows pulls only isaacsim + isaacsim-kernel; Kit bootstrap then warns 'PYTHONPATH path doesn't exist (...site-packages/isaacsim/exts/isaacsim.simulation_app)' / 'Unable to expose isaacsim.simulation_app API: Extension not found', and 'from isaacsim import SimulationApp' resolves to None, so AppLauncher dies with 'TypeError: NoneType object is not callable'. Match install.py / wheel_builder canonical spec: isaacsim[all]>=6.0.0.
Pytest collection over source/isaaclab/test imports sensors/test_tiled_camera_env.py whose module-level argparse.parse_args consumes pytest's --ignore=... / -m windows_ci flags and INTERNALERRORs collection (collected 595 items / 48 errors). The windows_ci-tagged path-IO tests on this branch all live in test/utils, so narrow the pytest scope to that subdir — keeps the marker filter intact without forcing every test file in the tree to be importable bare.
…lder) Bare 'isaacsim[all]' on Windows fails Kit startup with 'ImportError: cannot import name get_metrics_assembler_interface from omni.metrics.assembler.core (unknown location)' — the extension is registered but its implementation isn't on disk because the extscache extra wasn't requested. wheel_builder/res/python_packages.toml pins 'isaacsim[all,extscache]==6.0.0.*' for exactly this reason; mirror it.
import isaaclab_tasks walks all task packages, which transitively touches GroundPlaneCfg.physics_material -> isaaclab.sim.spawners.materials forwarding shim, which raises 'RigidBodyMaterialCfg has moved to isaaclab_physx.sim.spawners.materials. Install the isaaclab_physx extension or update your import.' Install it editable before isaaclab_assets / isaaclab_tasks so the shim resolves.
isaaclab-physx==1.1.0 declares a hard dep on isaaclab-ppisp which is not in source/ and not on any package index, so uv refuses the install with 'isaaclab-ppisp was not found in the package registry'. The ppisp import in isaaclab_physx is lazy (runtime, not at import), so --no-deps gets us a working editable install. Mirrors the same workaround used by the ARM-side install path (see install.py).
Three distinct gaps surfaced in path-io-windows on commit 683c110: 1. test_episode_data[cuda:0] parametrize: 'Torch not compiled with CUDA enabled' — default torch wheel on Windows pypi is CPU-only. Install torch + torchvision from download.pytorch.org/whl/cu128. 2. test_hdf5_dataset_file_handler: 'No module named h5py' — h5py was never declared by the isaaclab core dep set on Windows. Install it. 3. test_version.py / test_wrench_composer_*.py: KeyError 'EXP_PATH' at collection. Those files instantiate AppLauncher at module load and need an Isaac Sim install path-IO does not provide. Replace the '-m windows_ci' marker filter (which still imports every file in test/utils for collection) with explicit windows_ci-tagged file paths. Also drop --ignore=tools/conftest.py since no conftest sits under utils/.
The Windows runner reports 'vkEnumeratePhysicalDevices failed. No physical device is found.' / 'Failed to create any GPU devices' when Kit boots with --enable_cameras=True. Kit then hangs (the in-script 3-min watchdog can't reliably preempt a C-level GIL-held call), the job consumes its full timeout-minutes, and every other queued job on the same runner gets cancelled. Set the perception job's 'if' to false so it never claims the runner. Also tighten timeout-minutes from 30 to 10 so even when re-enabled it fails fast rather than starving siblings. Flip 'if' back to needs.changes.outputs.run_windows_ci == 'true' once the runner is confirmed GPU-capable.
Python thread watchdogs cannot preempt a Kit/Vulkan init that hangs in a C call holding the GIL — observed on this runner where the 3-min in-script time.sleep + os._exit never fired and perception_smoke held the Windows runner for the full 40-min job timeout, starving every other job. Replace the thread watchdog inside perception_smoke.py with a PowerShell Start-Process + WaitForExit at the shell layer (OS-level process kill, immune to GIL). Apply the same pattern to kit-launch-windows's inline python invocation. Tighten per-job timeout-minutes: general-windows 30 -> 15 install-windows 45 -> 30 kit-launch 30 -> 15 path-io 30 -> 15 The hard upper bound is now the second line of defence; the PowerShell watchdog catches runaway python first.
PowerShell on the Windows runner doesn't have bash on PATH: bash : The term 'bash' is not recognized as the name of a cmdlet ... Git for Windows installs bash.exe at C:\Program Files\Git\bin\bash.exe; invoke it directly with a Test-Path guard and exit-code check so failures fast-fail.
build.sh hardcoded python3. Linux installs expose python3 (and that
remains the default), but Windows git-bash only has python (no python3
symlink), so the build was dying with 'python3: command not found'
the moment install-windows tried to run the canonical wheel build.
Make build.sh use ${PYTHON:-python3} for every interpreter call and
pass PYTHON=python from the Windows workflow before invoking it. Linux
behavior unchanged; one variable lets Windows reuse the same script.
PowerShell on the Windows runner reads the yaml as a non-UTF-8 code page; em-dashes (U+2014) inside the Write-Host string literals got mojibake'd to 'â€"' and tripped the parser: ParserError: TerminatorExpectedAtEndOfString Replace the two affected em-dashes with ASCII '-'. Comment-line em-dashes elsewhere in the file are harmless (tokenizer skips them) and stay as-is to avoid touching unrelated lines.
build.sh runs 'python -m pip install build wheel' inside the venv. uv venv ships without pip by default, so this failed with C:\...\env_isaaclab_uv\Scripts\python.exe: No module named pip right after gen_pyproject.py emitted the generated pyproject.toml. Add --seed to the install-windows venv create so pip / setuptools / wheel land inside the venv; the other 3 jobs don't call build.sh and keep the lighter seedless venvs.
Flips perception-windows from 'if: false' back to the standard needs.changes.outputs.run_windows_ci gate. The PowerShell process-level watchdog around the inline Kit boot stays as the inner guard; the tightened 10-min job timeout-minutes is the outer guard so a Vulkan init regression cannot starve other queued jobs again.
The watchdog used $proc.Kill($true), which compiles on .NET 5+ but not on PowerShell 5.1's .NET Framework (Process.Kill has no (bool) overload there). It still surfaced 'MethodCountCouldNotFindBest' on the runner after the kill ::error was emitted. Switch to Stop-Process -Id $proc.Id -Force -ErrorAction SilentlyContinue which is PS5-native and idempotent.
The cross-platform CI series adds source/isaaclab_tasks/test/ test_cartpole_training_smoke.py without a paired fragment, so the nightly Check changelog fragments gate currently rejects the PR. Add a .skip entry under source/isaaclab_tasks/changelog.d/ matching the existing source/isaaclab/changelog.d/jichuanh-windows-ci.skip convention (CI/test-only, no user-facing API change).
Three related changes that together unblock the consolidated windows-ci job from the latent failures uncovered once the perception step stopped masking everything else: * Install `isaacsim[all,extscache]==6.0.0.*` BEFORE the cu128 torch upgrade. `isaacsim` pulls CPU torch transitively and was silently overwriting the cu128 wheel installed earlier; `[cuda:0]`-parametrized cases in Deps smoke and Path-IO then fail with "Torch not compiled with CUDA enabled". The new order mirrors install.py (_install_isaacsim() then _ensure_cuda_torch()). * Install `source/isaaclab_newton` with `--no-deps`. cartpole_env_cfg.py imports `isaaclab_newton.physics` at module load, so every cartpole task fails with `ModuleNotFoundError: No module named 'isaaclab_newton'` without it. Same `--no-deps` reason as isaaclab_physx (both declare a bare-name dep on isaaclab_ppisp that's not yet on this branch nor on any index; the ppisp import is lazy at runtime). The smoke-import line is extended so this regression fails fast in setup, not in a later test step. * Replace the em-dash in the Aggregate step's `Write-Host "::error::"` with an ASCII hyphen. PowerShell 5.1 reads the temp .ps1 as cp1252, so the 3-byte UTF-8 em-dash mis-decodes inside the string and the closing quote is mis-detected, raising "The string is missing the terminator". The path was never executed before because `$failed` was always empty (only perception had failed, and it was excluded from the gating set).
The temp_dir fixture used `tempfile.mkdtemp()` + `shutil.rmtree()` for cleanup. On Windows, h5py's libhdf5 keeps an internal handle to the file briefly after `.close()`, so `rmtree` races with the handle release and raises `PermissionError [WinError 32]` on teardown of `test_write_and_load_episode[cuda:0]`. The assertions had already passed; only the cleanup was failing. Switch to `tempfile.TemporaryDirectory(ignore_cleanup_errors=True)` (Python 3.10+). On Linux/macOS this flag is a no-op since no cleanup error is raised; on Windows it absorbs the libhdf5 handle-release race without masking real failures (the test body still asserts via the explicit `dataset_file_handler.close()` calls). Drop the now-unused `shutil` import.
Pull in source/isaaclab_ppisp (the ppisp package missing from this branch) and the updated install.py that includes isaaclab_ppisp and isaaclab_newton in CORE_ISAACLAB_SUBMODULES. With ppisp present, the workflow no longer needs --no-deps workarounds for isaaclab_physx / isaaclab_newton; the subsequent commit collapses the hand-rolled pip sequence into a single ./isaaclab.bat -i call.
Replace the hand-rolled `uv pip install ...` sequence in the setup step with a single `.\isaaclab.bat -i 'isaacsim,rl[rsl_rl,rl_games]'` call, now that the develop merge brings in `source/isaaclab_ppisp/` and the updated install.py that includes `isaaclab_ppisp` and `isaaclab_newton` in CORE_ISAACLAB_SUBMODULES. The hand-rolled sequence had grown three latent issues, all of which the canonical install.py path avoids: * Install order — `_install_isaacsim()` runs before `_ensure_cuda_torch()` inside install.py, so isaacsim's transitive CPU torch can't shadow the cu128 wheel. The previous hand-rolled order had the cu128 upgrade first and broke `[cuda:0]`-parametrized tests. * Missing isaaclab_newton — install.py walks CORE_ISAACLAB_SUBMODULES, so isaaclab_newton is installed automatically. cartpole_env_cfg.py's import of `isaaclab_newton.physics` no longer fails. * No more --no-deps workarounds — with `source/isaaclab_ppisp/` present the renderer-backend bare-name dep resolves through the local editable install. The workflow keeps the test-only `pytest pytest-timeout h5py` install (install.py doesn't carry pytest plumbing) and the post-install smoke import. Setup-step body shrinks from ~25 lines to ~3 substantive lines. Matches the "Mirror Linux CI setup for new platforms" rule: same entry point as Linux CI (`./isaaclab.sh -i`), so install-order bugs and new core submodules are picked up automatically when install.py changes.
PowerShell / pytest commands inside YAML run: blocks render as plain text in editors without an embedded-language highlighter, so heavy inline commentary inside those blocks becomes visual noise rather than documentation. Strip it. Inter-step comments (section headers, pre-step rationale, the disabled-perception context block) are kept — those sit at the YAML level and read fine without syntax-highlighting help. Net: -80 lines, mostly redundant restatement of what surrounding identifiers and commit history already make clear.
`test_train_cartpole_perception` builds Isaac-Cartpole-RGB-Camera-Direct-v0 which boots Kit with `enable_cameras=True`, hits the L40S TCC / no-vGPU Vulkan path, and hangs until the pytest 600s timeout fires (logs show `Stack of MainThread` thread dumps). Same blocker as the disabled standalone perception smoke. Filter the training-smoke pytest invocation with `-k 'not perception'` so the state subcase (Isaac-Cartpole-Direct-v0 + rsl_rl) is the only case exercised on the current Windows runner pool. Latest CI run shows the state subcase passes in ~30s. Drop the filter when the L40S vGPU unblock criterion lands (same condition tracked in the disabled perception step's context block).
Independent probe of the Vulkan loader on the runner, separate from Kit. Captures nvidia-smi driver+display info, lists vulkan-1.dll and ICD registry entries, and runs vulkaninfo --summary if available (falls back to a ctypes-based vkCreateInstance + vkEnumeratePhysicalDevices probe via the existing uv venv when the SDK isn't installed). Output goes to reports/vulkan-probe.txt and is included in the windows-ci-reports artifact. continue-on-error: true so the probe is informational only and does not gate the job. Added to the Aggregate $results listing for visibility. Background: PR 5700 perception step fails on the runner with "vkEnumeratePhysicalDevices failed. No physical device is found." + "TCC is not supported. GPU(s) should be in WDDM mode." Adding the direct vulkaninfo / loader probe answers the question of what the Vulkan ICD stack itself sees, independent of Kit's bootstrap path.
Last CI run's probe step parse-failed because PowerShell doesn't support bash heredoc (<<'PYEOF') and the YAML block scalar couldn't host an unindented PowerShell here-string for the embedded Python. Move the ctypes Vulkan loader probe out of the workflow into a standalone tools/vulkan_probe.py: * Loads vulkan-1.dll / libvulkan.so.1 via ctypes. * Calls vkCreateInstance + vkEnumeratePhysicalDevices. * Reports loader-load, instance-create, and physical-device count. * No dependencies beyond the OS Vulkan loader; cross-platform. The workflow now invokes it with system Python on the runner. Probe moves to the first runnable step (right after instance-state report) so diagnostic data appears in ~30 seconds instead of after the 15-min isaaclab.bat -i install. All other test steps gated off (`if: false`) for now while we iterate; aggregate gates the job purely on the probe's outcome. Disabled-perception context block left intact for the next maintainer.
…ools/" This reverts commit 966e6d3.
…nfo)" This reverts commit cd1e739.
PR 5695 lays foundation only. The two tagged tests (test_scipy.py, test_torch.py) had pytestmark applied here but no in-PR consumer — the arm-ci and windows-ci workflows that select on these markers ship in isaac-sim#5698 and isaac-sim#5700 respectively. Each downstream PR should land the marker annotations on the tests it actually wants in its canary, alongside the workflow that runs them. This keeps 5695 self-contained as scaffolding and prevents unused annotations from accumulating here.
The Windows runner GPUs are now in WDDM mode, so Kit's RTX/Vulkan path can enumerate a device. Re-enable the camera perception smoke and the perception subcase of the cartpole training smoke that were gated off under the data-center (TCC) driver, and add perception to the aggregate gating and report artifacts.
There was a problem hiding this comment.
Updated Review: Windows CI Pipeline (Part 3)
This is a comprehensive and well-structured PR that adds Windows CI support to IsaacLab. The implementation shows strong attention to platform-specific challenges and includes robust error handling. Below is my detailed analysis:
✅ Strengths
1. Solid Architecture
- Clean separation of reusable GitHub Actions (
windows-instance-state,windows-sim-paths) from the main workflow - Marker-driven test discovery (
windows_ci) allows adding tests without YAML edits - Consolidated single-job design prevents runner teardown between steps — smart approach for autoscaling runners
2. Robust Process Management
- OS-level watchdogs via
Start-Process + WaitForExitinstead of relying on GIL-vulnerable Python thread watchdogs - Per-job timeouts with
--timeout-method=thread(correctly avoiding Unix-only SIGALRM) - Hard timeouts with explicit kill on hung Kit/Vulkan processes
3. Cross-Platform Improvements
AssetConverterBasenow usestempfile.gettempdir()instead of hardcoded/tmp— correct fixwheel_builder/build.shPYTHONoverride handles Windows git-bash (which only haspython, notpython3)- HDF5 fixture uses
ignore_cleanup_errors=Trueto handle Windows file-lock races — good catch
4. Comprehensive Environment Setup
- EULA acceptance variables properly configured
- DLL search paths correctly prepended for Vulkan ICD discovery
- Instance state reporting helps diagnose runner issues (GPU presence, disk space, cache sizes)
⚠️ Suggestions & Minor Issues
1. Temporary Docker/Tests Disable in build.yaml
run_docker_tests: 'false' # TEMP: disable heavy Linux Docker + TestsThe comment says "revert before final review" — this should be reverted before merge. Consider adding a # TODO: tag that linters/CI can catch.
2. Hardcoded Git Bash Path
$gitBash = "C:\Program Files\Git\bin\bash.exe"
if (-not (Test-Path $gitBash)) { throw "Git Bash not found at $gitBash" }This assumes a standard Git for Windows installation. Consider:
- Falling back to
where.exe bashor checking common alternate paths (C:\Program Files (x86)\Git\bin\bash.exe) - Or documenting this as a runner requirement
3. Perception Test Gating
The perception test depends on WDDM mode for Vulkan. If runners could be in TCC mode, consider:
- Auto-detecting driver mode via
nvidia-smioutput - Adding a conditional skip rather than a CI failure
4. pytestmark Placement in test_configclass.py
from isaaclab.utils.configclass import _field_module_dir, configclass
pytestmark = pytest.mark.windows_ci
from isaaclab.utils.dict import class_to_dict, ...The pytestmark is sandwiched between imports — while functional, it's unconventional. Consider grouping it with other module-level declarations after all imports.
5. Magic Numbers
Several timeout values are hardcoded (300000ms, 180000ms, 600s, etc.). Consider:
- Documenting why these specific values were chosen
- Using named constants or environment variables for runner-specific tuning
📝 Questions
-
Checkpoint Cleanup: The cartpole training smoke tests run two iterations and create checkpoints. Are these cleaned up, or will they accumulate on runners?
-
Change Detection Logic: The
changesjob patterns don't include.github/actions/windows-*— is this intentional? Changes to these actions wouldn't trigger the workflow for validation. -
Python 3.12 Assumption: The
user site-packagespath assumes Python 3.12:"user site-pkgs" = "$env:APPDATA\Python\Python312\site-packages"
Should this be dynamic based on the installed Python version?
🔧 Final Notes
Overall, this is high-quality CI infrastructure work. The attention to Windows-specific quirks (DLL search order, file locking, PowerShell encoding, process watchdogs) demonstrates deep platform expertise.
Before merge:
- Revert the
run_docker_tests: 'false'workaround inbuild.yaml - Consider addressing the Git Bash path hardcoding
Other items are suggestions for future improvement rather than blockers.
Update (ad92f43): Reviewed the new commits since bae6977.
Changes Observed
-
Perception/Rendering Steps Re-enabled ✅
- Commit
27def394re-enables the perception and rendering test steps in the Windows CI workflow - The cartpole-camera perception smoke test now exercises the full RTX/Vulkan path
- Commit
-
Develop-Aligned Isaac Sim Installation ✅
- The workflow now installs a develop-aligned Isaac Sim build from the internal Artifactory index (instead of the public pip release)
- Uses authenticated index URLs with
ISAACSIM_ARTIFACTORY_READONLY_USERNAME/PASSWORDsecrets - The
tools/resolve_isaacsim_develop.pyresolver verifies the build is on the develop branch - Good approach to ensure Windows CI tests the same Sim as Linux/ARM develop
-
Authentication with Read-Only Secrets ✅
- Proper credential handling with URL-encoding for special characters
- GitHub masks secrets and uv redacts credentials when echoing index URLs
Outstanding Items
-
build.yaml Still Has
run_docker_tests: 'false'⚠️ - The temporary disable is still in place with the comment "revert before final review"
- This needs to be reverted before merge
-
Hardcoded Git Bash Path (unchanged from previous review)
- Still assumes standard Git for Windows installation at
C:\Program Files\Git\bin\bash.exe
- Still assumes standard Git for Windows installation at
Summary
The new commits address the perception/rendering testing and add proper develop-aligned Isaac Sim installation with authentication. The Windows CI workflow is now more comprehensive and aligned with the develop branch testing strategy.
Remaining before merge:
- Revert
run_docker_tests: 'false'in build.yaml
Update (5c04d23): Reviewed incremental changes since ad92f43.
Changes in This Push
- Artifactory URL Migration ✅
- All internal Artifactory URLs updated from
urm.nvidia.comtoartifactory.nvidia.com - PyPI index paths updated from
sw-isaacsim-pypi/simple→sw-isaacsim-pypi-local/simpleandct-omniverse-pypi/simple→ct-omniverse-pypi-local/simple - This affects both the
uv pip installindex URLs and theresolve_isaacsim_develop.py --index-urlargument - Standard infrastructure migration from the legacy URM hostname to the canonical Artifactory hostname with corrected local repository paths
- All internal Artifactory URLs updated from
Assessment
This is a straightforward infrastructure-only change — no logic modifications, no new features. The URL pattern change (-local/simple suffix) suggests these now point to the correct local Artifactory repositories rather than virtual aggregators.
No new concerns introduced. Previously noted items still apply:
build.yamlstill hasrun_docker_tests: 'false'that needs reverting before merge- Hardcoded Git Bash path remains unchanged
Update (5941931): Reviewed incremental changes since 5c04d23.
Changes in This Push
-
Fork PR Secret Handling — Graceful Degradation ✅
- Previously the workflow would hard-fail (
throw) whenISAACSIM_ARTIFACTORY_READONLY_*secrets were not set - Now emits a GitHub Actions
::warning::annotation and proceeds unauthenticated - This allows fork PRs (which don't receive repository secrets) to still exercise URL/DNS reachability
- Auth-dependent failures will surface later in in-repo runs where secrets are present
- Good approach — matches standard CI best practice for fork-safe workflows
- Previously the workflow would hard-fail (
-
Artifactory URL Path Correction ✅
- Removed
/api/pypi/prefix and/simplesuffix from all Artifactory index URLs - Before:
https://...artifactory.nvidia.com/artifactory/api/pypi/sw-isaacsim-pypi-local/simple - After:
https://...artifactory.nvidia.com/artifactory/sw-isaacsim-pypi-local/ - Same change applied to
ct-omniverse-pypi-localand theresolve_isaacsim_develop.py --index-urlargument - This likely aligns with the actual repository layout on the new
artifactory.nvidia.comhost (non-virtual repos don't use the/api/pypi/.../simplePEP 503 endpoint)
- Removed
Assessment
Both changes are operational improvements with no logic risk:
- The fork PR fix unblocks external contributors from running the workflow
- The URL path fix ensures correct repository resolution on the migrated Artifactory instance
No new concerns introduced. Previously noted items still apply:
build.yamlstill hasrun_docker_tests: 'false'that needs reverting before merge- Hardcoded Git Bash path remains unchanged
Update (77685d1): Reviewed incremental changes since 5941931.
Changes in This Push
- Network Reachability Diagnostic Step ✅
- Added a new "Probe index reachability (diagnostic)" step in the Windows CI workflow
- Uses
if: always()andcontinue-on-error: true— purely diagnostic, won't block the build - Probes three hosts:
artifactory.nvidia.com,urm.nvidia.com,pypi.nvidia.com - Tests both DNS resolution (
[System.Net.Dns]::GetHostAddresses) and TCP 443 connectivity (Test-NetConnection) - Logs runner metadata:
COMPUTERNAME,RUNNER_NAME, proxy environment variables, DNS server addresses - This decouples network/DNS diagnostics from pip failures — useful for debugging transient CI infra issues
Assessment
This is a non-blocking diagnostic addition. It helps isolate whether Windows CI failures stem from network reachability issues vs. package availability, especially given the recent Artifactory URL migrations. The step runs unconditionally (always()) so it will capture state even on prior step failures.
No new concerns introduced. Previously noted items still apply:
build.yamlstill hasrun_docker_tests: 'false'that needs reverting before merge- Hardcoded Git Bash path remains unchanged
Native Windows installed isaacsim from the public pip index (pinned to the 5.1.0 release in source/isaaclab/setup.py), while the Linux/ARM CI runs the develop-branch Isaac Sim container. Windows therefore tested a different, older Sim than the rest of the matrix. Resolve the develop-aligned build from the internal Artifactory index and pin it, verifying the build's commit is on omni_isaac_sim develop when a gitlab token is available and falling back to the newest 6.0.0 build with a warning otherwise. Install Isaac Sim from that index, then install IsaacLab without the isaacsim/all extras that would re-pin the public release. Add tools/resolve_isaacsim_develop.py and its unit tests. The internal-index egress and a develop win_amd64 wheel are CI-infra prerequisites tracked separately.
The internal Artifactory index that serves the develop-aligned Isaac Sim wheels dropped anonymous access, so the native Windows install path now needs credentials. Add ISAACSIM_ARTIFACTORY_READONLY_USERNAME / _PASSWORD to the setup step: resolve_isaacsim_develop.py reads them from the environment and sends a Basic auth header on the simple-index fetch, and the uv pip install builds authenticated --extra-index-url values from them. This puts native Windows on the same internal develop registry the Linux/ARM CI uses.
Force-skip the heavy install/build/multi-GPU PR workflows while iterating Windows CI on this PR, to save runner time and cost during the back-and- forth. Each guard is marked TEMP and reverts before final review; build.yaml already does the same for the Docker test matrix. - install-ci.yml: force run_install_tests=false - wheel.yml: force run_build=false (detect step still runs, check stays green) - license-check.yaml: job-level if:false - test-multi-gpu.yaml: job-level if:false (this PR touches app_launcher.py, which would otherwise trigger the multi-GPU self-hosted runners)
…park-ci-perception # Conflicts: # pyproject.toml
|
Superseded by #6086 — re-opened from a branch in this repo (not a fork) so the Windows CI job receives the |
|
Reopened — #6086 (same-repo variant) is closed. Confirmed the fork rule was not the blocker: even on an in-repo PR the |
urm.nvidia.com does not resolve from the runner (getaddrinfo failed). The correct internal Artifactory host is artifactory.nvidia.com and the Isaac Sim repo there is sw-isaacsim-pypi-local. Repoint the resolver and both uv extra-index-urls accordingly.
Use the index URL as provided (artifactory.nvidia.com/artifactory/ sw-isaacsim-pypi-local/) instead of rewriting it to the api/pypi/simple form. Also downgrade the missing-secret check from a hard failure to a warning so the fork PR can still exercise URL/DNS reachability (the fetch error happens before auth); auth-dependent steps still need in-repo runs.
Summary
Part 3 of the cross-platform CI series. Adds
.github/workflows/windows-ci.yaml— CI pipeline for Windows GPU self-hosted runners. Same shape asarm-ci.yaml(Part 2, #5698) but native install path instead of Docker.general-windows,install-windows(uv install + wheel build + reinstall),kit-launch-windows.path-io-windows,perception-windows, ...continue-on-error: truewhile runners stabilize.--timeout=N+--timeout-method=signalso hung tests fail fast (fixes the previous perception-windows pattern where Vulkan failures hung the job instead of erroring).pytest <root> -m windows_ci --continue-on-collection-errors. Adding a new Windows-safe test requires only tagging itwindows_ci, no yaml edit.Series
PRs prefixed with
[CI] Cross-platform —. Current siblings:Depends on Part 1 (#5695). Independent of Part 2 (#5698).
Test plan
./isaaclab.sh -f(pre-commit) passes.