Skip to content

Fix daily compatibility result reporting#6103

Open
AntoineRichard wants to merge 1 commit into
isaac-sim:developfrom
AntoineRichard:antoiner/fix-daily-compat-results
Open

Fix daily compatibility result reporting#6103
AntoineRichard wants to merge 1 commit into
isaac-sim:developfrom
AntoineRichard:antoiner/fix-daily-compat-results

Conversation

@AntoineRichard

@AntoineRichard AntoineRichard commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Removed the stale daily compatibility workflow call to the deleted local combine-results action.
  • Report and upload the downloaded compatibility JUnit XML files directly with wildcard paths.
  • Added a static workflow test that catches missing local action metadata before runtime.

Test Plan

  • ./isaaclab.sh -p -m pytest .github/actions/test_workflow_local_actions.py .github/actions/run-package-tests/test_cleanup_docker_storage.py
  • PATH=/tmp/git-lfs-codex/git-lfs-3.7.1:$PATH ./isaaclab.sh -f
  • git diff --check HEAD~1..HEAD

@github-actions github-actions Bot added bug Something isn't working documentation Improvements or additions to documentation asset New asset feature or request isaac-sim Related to Isaac Sim team isaac-mimic Related to Isaac Mimic team infrastructure labels Jun 10, 2026

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR merges the develop branch into main, carrying 250 commits worth of accumulated changes plus 5 new commits at the tip that constitute the actual fix:

  1. CI Workflow Fix (1e4634d) — Removes a stale local action reference (./.github/actions/combine-results) from the daily compatibility workflow and replaces it with direct JUnit XML glob consumption.
  2. MJWarp Reset Fixes (c6f4a27, 047320b, ca176cf, cca3053) — Addresses NaN propagation in the MuJoCo Warp solver after env reset.

Findings

1. 🟢 CI Workflow Fix — Correct and Well-Scoped

The core fix in .github/workflows/daily-compatibility.yml is clean:

  • Removes the checkout step + uses: ./.github/actions/combine-results (which referenced a non-existent action)
  • Replaces the single combined XML artifact with a glob reports/**/*.xml
  • The dorny/test-reporter action natively supports multiple JUnit files via glob, so this is the right approach

The new test file .github/actions/test_workflow_local_actions.py is a good preventive measure — it validates that all local action references in workflow files point to directories with action.yml/action.yaml. This would have caught the original breakage.

2. 🟡 MJWarp Reset: _reset_solver_internals Called with Potentially All-False Mask

In newton_manager.py, the hook is called unconditionally at the top of step():

cls._reset_solver_internals(cls._world_reset_mask)

This fires every step even when no environments were reset (all False). While mujoco_warp.reset_data with an all-false mask is a semantic no-op (it only resets flagged worlds), it still launches a Warp kernel. Consider guarding with wp.any(cls._world_reset_mask) or tracking a dirty flag to skip the kernel launch entirely on steps with no resets — this could save ~1-3% per-step overhead on large env counts.

Severity: Low (performance suggestion, not correctness)

3. 🟡 Mask Dtype Migration: Kamino Int32 Mirror Is a Maintenance Liability

The change from wp.int32 to wp.bool for _world_reset_mask is well-motivated (MJWarp's reset_data requires wp.array[bool]), but it introduces a separate _world_reset_mask_int32 mirror for Kamino with an explicit copy kernel (_convert_bool_mask_to_int32) every step. The code documents this is tracked upstream (newton-physics/newton#2932), which is good.

However, the _world_reset_mask_int32 allocation happens in _forward_kamino which is a @classmethod. The allocation guard (if cls._world_reset_mask_int32 is None) is missing — the mirror is allocated once via start_simulation but there's no explicit allocation path shown. If _forward_kamino is called before start_simulation sets up the mirror, it would fail.

Severity: Low (defensive coding suggestion; unlikely to hit in practice given lifecycle ordering)

4. 🟡 Device Scoping in MJWarp Reset: Correct but Fragile

The fix in cca3053 scopes mujoco_warp.reset_data to the MJWarp data device. The lazy allocation of _mjwarp_reset_mask uses:

import mujoco_warp
mjw_device = cls._solver.mjw_data.qpos.device

This assumes cls._solver.mjw_data.qpos is always allocated and on the correct device when _reset_solver_internals fires. Given the lifecycle (start_simulationstep), this should always hold, but a defensive check or early-out if cls._solver is None would be prudent.

Severity: Low (robustness suggestion)

5. 🟢 Test Coverage: Workflow Validation Is Adequate

The test_workflow_local_actions.py test uses a simple regex to find uses: ./.github/actions/... patterns and checks for action.yml/action.yaml. This is effective for catching the exact class of bug that caused the original failure.

However, the MJWarp reset changes (4 commits) do not appear to include new unit tests specific to the reset-data path. The changelog references that existing CI tests (CPU-parameterized contact-sensor tests) caught the device mismatch, and the NaN propagation fix is validated by existing RL training workflows. Explicit unit tests for _reset_solver_internals (e.g., verify solver internals are actually zeroed after reset) would strengthen the safety net for future refactors.

Severity: Low (test coverage gap, but existing integration tests cover the happy path)

6. 🟢 No Breaking Changes in the CI Fix

The artifact name change (daily-compat-*-combined-test-resultsdaily-compat-*-test-results) and path change are internal CI artifacts. No downstream consumers outside the workflow itself reference these by name, so this is safe.


Verdict

The PR is approve-worthy. The CI workflow fix is correct and addresses the immediate breakage. The MJWarp reset fixes are well-documented, carry appropriate changelog fragments, and reference upstream tracking issues. The code quality is high with thorough docstrings explaining the device-routing and dtype constraints.

Minor suggestions (non-blocking):

  • Add an early-out guard to avoid launching reset_data on steps with no resets
  • Consider a dedicated unit test for the MJWarp reset path

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Fix daily compatibility result reporting

Verdict: Approve

Summary

Clean, well-scoped fix that removes a fragile shell-based XML combiner in favor of native wildcard support in dorny/test-reporter and actions/upload-artifact. The addition of a static test to catch future dangling local-action references is a nice preventative measure.

What Changed

File Change
.github/workflows/daily-compatibility.yml Remove combine-results step, use reports/**/*.xml wildcards directly; upgrade actions (checkout@v6, upload-artifact@v7, download-artifact@v8, test-reporter@v2.6.0); add config job for image resolution; switch combine-compat-results and notify-compatibility-status to ubuntu-latest; add extra-pip-packages for ovrtx/ovphysx
.github/actions/combine-results/action.yml Deleted — was a composite action using sed to merge JUnit XMLs
.github/actions/test_workflow_local_actions.py New test that validates all uses: ./.github/actions/* references resolve to an existing action.yml

Analysis

  1. Correctness of the removal: The old combine-results action used sed '1d; s/^<testsuites>//; s/<\/testsuites>$//' which is fragile — it assumes the XML declaration is always line 1 and that <testsuites> tags appear on their own lines. dorny/test-reporter@v2.6.0 supports glob patterns natively (path: reports/**/*.xml) and handles multi-file JUnit aggregation correctly, making the manual merge redundant.

  2. Runner change is appropriate: combine-compat-results and notify-compatibility-status only download artifacts and generate markdown — no GPU or simulation needed. Moving them to ubuntu-latest reduces GPU queue pressure.

  3. if-no-files-found: warn: Good defensive choice. If upstream jobs fail before producing artifacts, the workflow logs a warning instead of failing hard, which preserves the ability to see partial results.

  4. Static test coverage: test_workflow_local_actions.py is a simple but effective guard against the exact class of bug this PR fixes (referencing a deleted local action). The regex ^\s*uses:\s*["']?(?P<path>\./\.github/actions/[^"'\s#]+) handles the common YAML patterns well.

Minor Observations (Non-blocking)

  • download-artifact@v8 + upload-artifact@v7 version mismatch: Not a functional issue — the download/upload actions are independent packages with independent versioning — but worth confirming v8 download is compatible with v7 upload artifacts from the same run. (It is; v8 just adds streaming download.)

  • merge-multiple: true with path: reports/**/*.xml upload: When merge-multiple: true is used in download, all artifacts matching the pattern are merged into a flat directory. The reports/**/*.xml glob in the upload step then catches everything correctly. Verified this matches the actions/download-artifact docs.

Conclusion

The fix is correct, removes dead code, and adds a regression test. No issues found.


Update (commit 8862766): The new commits add a large batch of changes unrelated to the workflow fix (OVPhysX test marker cleanup, device_split CI support, SoftDTW Numba→Torch migration, lift task consolidation, Newton visualizer particle support, multi-backend physics presets, surface gripper wildcard fix, isaaclab_ppisp optional dependency, skrl JAX fix, LEAPP recurrent policy export, etc.). These do not affect the workflow changes originally reviewed. No new issues found in the incremental diff — the changes are internally consistent, properly versioned, and include corresponding tests and changelogs.

@AntoineRichard AntoineRichard requested review from myurasov-nv and removed request for jtigue-bdai and ooctipus June 10, 2026 13:20
Remove the stale local combine-results action call from the daily compatibility workflow and let the test reporter consume the downloaded JUnit XML files directly.

Add a static workflow test so missing local action metadata is caught before GitHub Actions reaches runtime.
@AntoineRichard AntoineRichard force-pushed the antoiner/fix-daily-compat-results branch from 1e4634d to 8862766 Compare June 10, 2026 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

asset New asset feature or request bug Something isn't working documentation Improvements or additions to documentation infrastructure isaac-mimic Related to Isaac Mimic team isaac-sim Related to Isaac Sim team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant