OMPE-96088: Fix cartpole camera anomalies for newton + rtx variants#6084
OMPE-96088: Fix cartpole camera anomalies for newton + rtx variants#6084jmart-nv wants to merge 1 commit into
Conversation
…ariants that don't provide implicit temporal info.
There was a problem hiding this comment.
🤖 IsaacLab Review Bot — PR #6084
OMPE-96088: Fix cartpole camera anomalies for newton + rtx variants
Summary
Well-motivated fix that correctly extends the frame-stacking heuristic to cover Newton+RTX AOVs lacking DLSS temporal accumulation. The three-tier resolver logic (non-Newton → 1, Newton+Warp → 2, Newton+RTX per-AOV) is clean and the code is well-documented. Tests comprehensively cover the new truth table.
Findings
🟡 Warning — semantic_segmentation under Newton+RTX silently defaults to no stacking
semantic_segmentation produces discrete class labels and almost certainly bypasses DLSS temporal accumulation (it is not a photorealistic signal that benefits from temporal AA). Under Newton+RTX, the current logic returns frame_stack=1 since "semantic_segmentation".startswith(("depth", "albedo", "simple_shading")) is False.
If this is intentional (e.g., segmentation observations are not used for velocity inference or the env is only used for non-RL purposes), please add a brief code comment noting the exclusion. If unintentional, it would cause the same convergence failure this PR fixes for the other AOVs.
File: source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_camera_env.py, line 61
🟡 Warning — Unused SIMPLE_SHADING_TYPES constant creates maintenance confusion
The module defines SIMPLE_SHADING_TYPES (line 25) as a set of exact strings, but _resolve_frame_stack_default uses startswith("simple_shading") instead. This is a mismatch that may confuse future contributors: they might update the set thinking it controls stacking, while the actual logic is the prefix match. Consider either:
- Using the constant in the resolver:
any(data_type.startswith(p) for p in _DLSS_BYPASS_PREFIXES), or - Adding a comment near
SIMPLE_SHADING_TYPESclarifying it is not used for frame-stack resolution.
File: source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_camera_env.py, lines 25–28 vs 61
🔵 Suggestion — Future-proof with a constant for DLSS-bypass prefixes
The startswith(("depth", "albedo", "simple_shading")) pattern is compact but adding new non-DLSS AOVs requires editing logic inside the method. Consider extracting to a module constant:
_NO_TEMPORAL_CUE_PREFIXES = ("depth", "albedo", "simple_shading")This gives future contributors a single, documented place to register new AOV prefixes that bypass DLSS, reducing the risk of silent convergence failures when new AOVs are added.
🔵 Suggestion — Make test_newton_with_default_renderer precondition explicit
The existing test test_newton_with_default_renderer now implicitly depends on the default preset's data type being "rgb" to reach the correct code path. Adding an explicit assertion (matching the style of the new tests) would prevent false confidence if the default changes:
def test_newton_with_default_renderer(self):
cfg = _resolve("newton_mjwarp")
assert isinstance(cfg.sim.physics, NewtonCfg)
assert isinstance(cfg.tiled_camera.renderer_cfg, IsaacRtxRendererCfg)
assert cfg.tiled_camera.data_types == ["rgb"] # <-- make precondition explicit
assert _policy_default(cfg) == 1File: source/isaaclab_tasks/test/core/test_cartpole_camera_presets_frame_stack.py, line ~79
🟢 Note — No end-to-end test for Newton+RTX+non-rgb combo
The TestEnvConstructionEndToEnd parametrized cases cover Newton+Warp and PhysX but don't exercise the new Newton+RTX+depth (or similar) path through actual env construction. The cfg-level tests in TestFrameStackTruthTable give good coverage of the resolver logic, but if wiring between the resolver and the env class regresses specifically for RTX, it wouldn't be caught. Low priority since the resolver is already tested in isolation.
Verdict
✅ Approve with minor suggestions. The fix is correct, well-scoped, and thoroughly tested at the config level. The primary actionable item is confirming whether semantic_segmentation under Newton+RTX intentionally stays at frame_stack=1. The other suggestions are non-blocking maintainability improvements.
🤖 Generated by IsaacLab Review Bot | Commit: db5ecb5
Greptile SummaryFixes a convergence failure in the cartpole camera environment when running Newton physics with RTX depth, albedo, or simple_shading AOVs by extending
Confidence Score: 4/5Safe to merge; the core resolver logic is correct for all documented AOVs and is backed by new truth-table tests. The fix is well-scoped and the new tests verify every explicitly named AOV. The two open questions — whether The resolver logic in Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[_resolve_frame_stack_default] --> B{Newton physics?}
B -- No --> R1[return 1]
B -- Yes --> C{Warp renderer?}
C -- Yes --> R2[return 2]
C -- No --> D{AOV is depth, albedo,\nor simple_shading?}
D -- Yes --> R3[return 2]
D -- No --> R4[return 1]
style R2 fill:#f96,stroke:#333
style R3 fill:#f96,stroke:#333
style R1 fill:#6f6,stroke:#333
style R4 fill:#6f6,stroke:#333
|
| return 2 | ||
| data_types = getattr(camera_cfg, "data_types", None) or [] | ||
| data_type = data_types[0] if data_types else "" | ||
| return 2 if data_type.startswith(("depth", "albedo", "simple_shading")) else 1 |
There was a problem hiding this comment.
SIMPLE_SHADING_TYPES constant unused in resolver
The module already defines SIMPLE_SHADING_TYPES = {"simple_shading_constant_diffuse", "simple_shading_diffuse_mdl", "simple_shading_full_mdl"} at the top of the file (line 24–28), but _resolve_frame_stack_default uses the broader startswith("simple_shading") predicate instead of data_type in SIMPLE_SHADING_TYPES. While functionally equivalent today, the two definitions can silently drift: a new variant added to SIMPLE_SHADING_TYPES that doesn't start with "simple_shading" would be missed by the resolver, or a type starting with "simple_shading" that isn't in the set would be incorrectly matched. Using the constant in both places keeps the two in sync.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| data_types = getattr(camera_cfg, "data_types", None) or [] | ||
| data_type = data_types[0] if data_types else "" | ||
| return 2 if data_type.startswith(("depth", "albedo", "simple_shading")) else 1 |
There was a problem hiding this comment.
semantic_segmentation AOV not covered by the new stacking logic
The startswith(("depth", "albedo", "simple_shading")) predicate returns 1 for semantic_segmentation under Newton + RTX. If semantic_segmentation also bypasses DLSS temporal accumulation (like depth/albedo/simple_shading), Newton + RTX + semantic_segmentation would silently receive a single-frame observation and likely fail to converge — the same symptom this PR fixes for the other AOVs. The TestObsSpaceBumpArithmetic table lists semantic_segmentation with 4 channels, confirming it is a supported data type. A clarifying comment or an explicit case for semantic_segmentation would prevent ambiguity.
| Newton has no implicit damping, so the policy needs a temporal cue to infer velocity. | ||
| Returns ``2`` when the render carries none, else ``1``: | ||
|
|
||
| - Non-Newton (PhysX / OV-PhysX): damping suffices -> ``1``. |
There was a problem hiding this comment.
I think we have some problem in the design:
return 2 if data_type.startswith(("depth", "albedo", "simple_shading")) else 1
Those are kind of details of the output of the renderer, or its configuration that we re-define it in yet another place. At some point this could become a maintenance nightmare.
|
Moved back to draft: still investigating the simple shading issues (found the correct lighting settings, running another round of tests with those) and refactoring some of the framestack logic |
Description
Recent benchmarks have shown a large convergence gap for the cartpole camera env on combos that use newton physics and non-rgb RTX variants. This is because newton physics does not provide implicit damping, and thus requires temporal data - either implicit (TAA artifacts in RTX rgb) or explicit (frame stacking in newton+warp.)
Since the non-rgb variants of RTX do not provide implicit temporal data via TAA, it is necessary to enable frame stacking so the CNN can infer velocity frame-to-frame. This allows the problematic combos to converge to the same high success_rate as their analogous baselines: newton+RTX depth/albedo now out-perform the same physx combos. Newton+RTX simple shading converges comparably to physx - still relatively low, but this will require a deeper investigation into simple shading if we wish to fix it.
Performance shows a marginal FPS overhead and twice as much memory usage - within expectations for this change.
Type of change
Screenshots
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there