Skip to content

OMPE-96088: Fix cartpole camera anomalies for newton + rtx variants#6084

Draft
jmart-nv wants to merge 1 commit into
isaac-sim:developfrom
jmart-nv:jmart/cartpole-rtx
Draft

OMPE-96088: Fix cartpole camera anomalies for newton + rtx variants#6084
jmart-nv wants to merge 1 commit into
isaac-sim:developfrom
jmart-nv:jmart/cartpole-rtx

Conversation

@jmart-nv

@jmart-nv jmart-nv commented Jun 9, 2026

Copy link
Copy Markdown

Description

Recent benchmarks have shown a large convergence gap for the cartpole camera env on combos that use newton physics and non-rgb RTX variants. This is because newton physics does not provide implicit damping, and thus requires temporal data - either implicit (TAA artifacts in RTX rgb) or explicit (frame stacking in newton+warp.)

Since the non-rgb variants of RTX do not provide implicit temporal data via TAA, it is necessary to enable frame stacking so the CNN can infer velocity frame-to-frame. This allows the problematic combos to converge to the same high success_rate as their analogous baselines: newton+RTX depth/albedo now out-perform the same physx combos. Newton+RTX simple shading converges comparably to physx - still relatively low, but this will require a deeper investigation into simple shading if we wish to fix it.

Performance shows a marginal FPS overhead and twice as much memory usage - within expectations for this change.

Newton+RTX SR (fs1) SR (fs2) FPS (fs1) FPS (fs2) VRAM (fs1) VRAM (fs2)
depth 0.01 0.96 27.9k 26.9k 9.4 GB 14.3 GB
albedo 0.06 0.95 24.8k 21.8k 18.7 GB 33.6 GB
simple_shading 0.01 0.51 24.5k 21.8k 18.7 GB 34.1 GB

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

convergence_framestack_fix

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation (N/A - frame stacking already documented)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

…ariants that don't provide implicit temporal info.
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 9, 2026

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 IsaacLab Review Bot — PR #6084

OMPE-96088: Fix cartpole camera anomalies for newton + rtx variants

Summary

Well-motivated fix that correctly extends the frame-stacking heuristic to cover Newton+RTX AOVs lacking DLSS temporal accumulation. The three-tier resolver logic (non-Newton → 1, Newton+Warp → 2, Newton+RTX per-AOV) is clean and the code is well-documented. Tests comprehensively cover the new truth table.


Findings

🟡 Warning — semantic_segmentation under Newton+RTX silently defaults to no stacking

semantic_segmentation produces discrete class labels and almost certainly bypasses DLSS temporal accumulation (it is not a photorealistic signal that benefits from temporal AA). Under Newton+RTX, the current logic returns frame_stack=1 since "semantic_segmentation".startswith(("depth", "albedo", "simple_shading")) is False.

If this is intentional (e.g., segmentation observations are not used for velocity inference or the env is only used for non-RL purposes), please add a brief code comment noting the exclusion. If unintentional, it would cause the same convergence failure this PR fixes for the other AOVs.

File: source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_camera_env.py, line 61


🟡 Warning — Unused SIMPLE_SHADING_TYPES constant creates maintenance confusion

The module defines SIMPLE_SHADING_TYPES (line 25) as a set of exact strings, but _resolve_frame_stack_default uses startswith("simple_shading") instead. This is a mismatch that may confuse future contributors: they might update the set thinking it controls stacking, while the actual logic is the prefix match. Consider either:

  • Using the constant in the resolver: any(data_type.startswith(p) for p in _DLSS_BYPASS_PREFIXES), or
  • Adding a comment near SIMPLE_SHADING_TYPES clarifying it is not used for frame-stack resolution.

File: source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_camera_env.py, lines 25–28 vs 61


🔵 Suggestion — Future-proof with a constant for DLSS-bypass prefixes

The startswith(("depth", "albedo", "simple_shading")) pattern is compact but adding new non-DLSS AOVs requires editing logic inside the method. Consider extracting to a module constant:

_NO_TEMPORAL_CUE_PREFIXES = ("depth", "albedo", "simple_shading")

This gives future contributors a single, documented place to register new AOV prefixes that bypass DLSS, reducing the risk of silent convergence failures when new AOVs are added.


🔵 Suggestion — Make test_newton_with_default_renderer precondition explicit

The existing test test_newton_with_default_renderer now implicitly depends on the default preset's data type being "rgb" to reach the correct code path. Adding an explicit assertion (matching the style of the new tests) would prevent false confidence if the default changes:

def test_newton_with_default_renderer(self):
    cfg = _resolve("newton_mjwarp")
    assert isinstance(cfg.sim.physics, NewtonCfg)
    assert isinstance(cfg.tiled_camera.renderer_cfg, IsaacRtxRendererCfg)
    assert cfg.tiled_camera.data_types == ["rgb"]  # <-- make precondition explicit
    assert _policy_default(cfg) == 1

File: source/isaaclab_tasks/test/core/test_cartpole_camera_presets_frame_stack.py, line ~79


🟢 Note — No end-to-end test for Newton+RTX+non-rgb combo

The TestEnvConstructionEndToEnd parametrized cases cover Newton+Warp and PhysX but don't exercise the new Newton+RTX+depth (or similar) path through actual env construction. The cfg-level tests in TestFrameStackTruthTable give good coverage of the resolver logic, but if wiring between the resolver and the env class regresses specifically for RTX, it wouldn't be caught. Low priority since the resolver is already tested in isolation.


Verdict

Approve with minor suggestions. The fix is correct, well-scoped, and thoroughly tested at the config level. The primary actionable item is confirming whether semantic_segmentation under Newton+RTX intentionally stays at frame_stack=1. The other suggestions are non-blocking maintainability improvements.


🤖 Generated by IsaacLab Review Bot | Commit: db5ecb5

@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Fixes a convergence failure in the cartpole camera environment when running Newton physics with RTX depth, albedo, or simple_shading AOVs by extending _resolve_frame_stack_default to return frame_stack=2 for these combinations, matching the existing Newton + Warp behaviour.

  • _resolve_frame_stack_default now checks the data_types[0] prefix after ruling out Warp, returning 2 for Newton + RTX depth/albedo/simple_shading (which bypass DLSS temporal accumulation) and 1 for rgb (which benefits from DLSS).
  • Five new cfg-level truth-table tests validate all cells in the extended physics × renderer × AOV matrix; end-to-end tests remain limited to the PhysX and Newton + Warp baselines.

Confidence Score: 4/5

Safe to merge; the core resolver logic is correct for all documented AOVs and is backed by new truth-table tests.

The fix is well-scoped and the new tests verify every explicitly named AOV. The two open questions — whether semantic_segmentation also bypasses DLSS and the inconsistency between the startswith predicate and the existing SIMPLE_SHADING_TYPES set — don't affect current behaviour but could silently bite a future contributor adding a new data type.

The resolver logic in cartpole_direct_camera_env.py (lines 59–61) deserves a second look regarding the semantic_segmentation case and the relationship to SIMPLE_SHADING_TYPES.

Important Files Changed

Filename Overview
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_camera_env.py Extends _resolve_frame_stack_default to return 2 for Newton + RTX depth/albedo/simple_shading AOVs in addition to Newton + Warp; logic uses startswith rather than the already-defined SIMPLE_SHADING_TYPES constant, and semantic_segmentation is not explicitly addressed.
source/isaaclab_tasks/test/core/test_cartpole_camera_presets_frame_stack.py Adds five new cfg-level truth-table tests covering Newton+RTX depth, albedo, simple_shading (expect 2), rgb (expect 1), and PhysX+RTX depth (expect 1); the pre-existing test_newton_with_default_renderer docstring is now stale and potentially misleading.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_camera_env_cfg.py Updates the frame_stack docstring to reflect the expanded auto-resolution logic; purely documentation.
source/isaaclab_tasks/changelog.d/jmart-cartpole-rtx.rst New changelog entry accurately describing the fix; no code changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_resolve_frame_stack_default] --> B{Newton physics?}
    B -- No --> R1[return 1]
    B -- Yes --> C{Warp renderer?}
    C -- Yes --> R2[return 2]
    C -- No --> D{AOV is depth, albedo,\nor simple_shading?}
    D -- Yes --> R3[return 2]
    D -- No --> R4[return 1]

    style R2 fill:#f96,stroke:#333
    style R3 fill:#f96,stroke:#333
    style R1 fill:#6f6,stroke:#333
    style R4 fill:#6f6,stroke:#333
Loading

Comments Outside Diff (1)

  1. source/isaaclab_tasks/test/core/test_cartpole_camera_presets_frame_stack.py, line 73-78 (link)

    P2 Stale docstring contradicts new per-AOV logic

    The docstring says "Newton physics + RTX renderer — RTX provides temporal information", which was true before this PR but is no longer the full picture. After this change only Newton + RTX + rgb returns 1; depth/albedo/simple_shading return 2. The test assertion remains correct (the default data type is rgb) but the justification in the docstring now misleads anyone maintaining this truth-table. It should clarify that the 1 result holds here specifically because the default AOV is rgb.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "Fix cartpole camera convergence on newto..." | Re-trigger Greptile

return 2
data_types = getattr(camera_cfg, "data_types", None) or []
data_type = data_types[0] if data_types else ""
return 2 if data_type.startswith(("depth", "albedo", "simple_shading")) else 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 SIMPLE_SHADING_TYPES constant unused in resolver

The module already defines SIMPLE_SHADING_TYPES = {"simple_shading_constant_diffuse", "simple_shading_diffuse_mdl", "simple_shading_full_mdl"} at the top of the file (line 24–28), but _resolve_frame_stack_default uses the broader startswith("simple_shading") predicate instead of data_type in SIMPLE_SHADING_TYPES. While functionally equivalent today, the two definitions can silently drift: a new variant added to SIMPLE_SHADING_TYPES that doesn't start with "simple_shading" would be missed by the resolver, or a type starting with "simple_shading" that isn't in the set would be incorrectly matched. Using the constant in both places keeps the two in sync.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +59 to +61
data_types = getattr(camera_cfg, "data_types", None) or []
data_type = data_types[0] if data_types else ""
return 2 if data_type.startswith(("depth", "albedo", "simple_shading")) else 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 semantic_segmentation AOV not covered by the new stacking logic

The startswith(("depth", "albedo", "simple_shading")) predicate returns 1 for semantic_segmentation under Newton + RTX. If semantic_segmentation also bypasses DLSS temporal accumulation (like depth/albedo/simple_shading), Newton + RTX + semantic_segmentation would silently receive a single-frame observation and likely fail to converge — the same symptom this PR fixes for the other AOVs. The TestObsSpaceBumpArithmetic table lists semantic_segmentation with 4 channels, confirming it is a supported data type. A clarifying comment or an explicit case for semantic_segmentation would prevent ambiguity.

Newton has no implicit damping, so the policy needs a temporal cue to infer velocity.
Returns ``2`` when the render carries none, else ``1``:

- Non-Newton (PhysX / OV-PhysX): damping suffices -> ``1``.

@pbarejko pbarejko Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have some problem in the design:

return 2 if data_type.startswith(("depth", "albedo", "simple_shading")) else 1

Those are kind of details of the output of the renderer, or its configuration that we re-define it in yet another place. At some point this could become a maintenance nightmare.

@jmart-nv jmart-nv marked this pull request as draft June 10, 2026 19:49
@jmart-nv

Copy link
Copy Markdown
Author

Moved back to draft: still investigating the simple shading issues (found the correct lighting settings, running another round of tests with those) and refactoring some of the framestack logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants