Skip to content

Feature/nvblox next datagen#761

Open
viiik-inside wants to merge 6 commits into
mainfrom
feature/nvblox_next_datagen
Open

Feature/nvblox next datagen#761
viiik-inside wants to merge 6 commits into
mainfrom
feature/nvblox_next_datagen

Conversation

@viiik-inside

Copy link
Copy Markdown
Collaborator

Summary

Short description of the change (max 50 chars)

Detailed description

  • What was the reason for the change?
  • What has been changed?
  • What is the impact of this change?

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR adds a new isaaclab_arena_datagen package (~8,000 lines) that generates SyntheticScene-format HDF5 data (RGB, depth, normals, semantics, optical/scene flow, dynamic-object poses, mesh samples) from Isaac Lab Arena simulations. It supports both standalone scene generation and opt-in collection during policy rollouts via a duck-typed collector hook. The implementation is well-structured and self-contained (no nvblox_next/pytorch3d dependency), though there are several findings ranging from a potential correctness bug to minor improvements.

Design Assessment

Architecture: Strong. Clean separation of concerns across well-defined modules:

  • pipeline.py centralizes per-step recording logic shared by both entry points
  • Duck-typed collector parameter avoids coupling core policy_runner to datagen
  • geometry/ provides a self-contained torch-only SE(3) layer (no pytorch3d)
  • io/hdf5_writer.py is a standalone writer compatible with nvblox_next loaders
  • scene_metadata.py provides a single source of truth for naming/paths

Package layout: Good. The 84 environment classes are well-organized by category (single_object, no_collision, collision, miscellaneous) with metadata-driven registry names.

Integration approach: Non-invasive. The core policy_runner.py changes are minimal (add optional collector param + 4 lines of hook calls), keeping it backward-compatible.

Findings

🔴 Potential Correctness Bug (Medium-High Severity)

scene_flow.py_classify_pixels_by_tracking() point cloud reshape

points_C_n3 = unproject_depth(depth_hw, intrinsics_33)
...
points_W_hw3 = points_W_hw3.reshape(W, H, 3).permute(1, 0, 2).contiguous()

Isaac Lab's unproject_depth takes an (H, W) depth map and returns points in raster order (H*W, 3). Reshaping as (W, H, 3) then permuting (1, 0, 2) effectively transposes the spatial axes. This should likely be reshape(H, W, 3) directly. If unproject_depth internally flattens column-major, the current code is correct — but this needs explicit verification against Isaac Lab's implementation, as a silent transpose would corrupt all scene flow and tracking data.

🟡 Medium Severity

  1. policy_runner.py:100collector.on_step called before episode-reset check
    The collector records the step before the terminated/truncated check. If the episode resets, the recorded obs/state may correspond to a post-reset frame while the action was pre-reset. For most use cases (SyntheticScene captures the sim state, not obs), this is likely fine, but worth documenting the semantics.

  2. dynamic_object_tracker.pytrimesh is a hard import at module top level

    import trimesh
    import trimesh.sample

    The README says trimesh is optional (in datagen-viz extras), but dynamic_object_tracker.py imports it unconditionally at the top level. If mesh sampling is always needed for the core pipeline, trimesh should be in RUNTIME_DEPS. If not, it should be lazily imported.

  3. camera_handler.py:~line 470camera._initialize_callback(None) uses private API

    camera._initialize_callback(None)  # pylint: disable=protected-access

    This is fragile and may break on Isaac Lab version upgrades. Consider adding a comment with the Isaac Lab version this was tested against, or wrapping in try/except with a clear error.

🟢 Low Severity / Suggestions

  1. collector.py — No validation that num_envs == 1
    The docstring states "Single environment only (num_envs == 1)", but from_env() doesn't assert this. An early assertion like assert env.unwrapped.num_envs == 1 would prevent silent data corruption with multi-env setups.

  2. pipeline.pyrun_simulation_loop does an extra initial env.step + handler.update before the recording loop
    This "warmup" step is not recorded. If this is intentional (first step has uninitialized render buffers), it should have a comment explaining why.

  3. visualizer.py — 1822 lines in a single file
    This is quite large. Consider splitting into submodules (e.g., visualizer/colormaps.py, visualizer/grid.py, visualizer/plotly_viz.py) for maintainability. Not blocking.

  4. scene_flow.py_classify_pixels_by_tracking iterates unique instance IDs on GPU
    Calling .unique() then iterating in Python over each ID can be slow for scenes with many instances. Acceptable for datagen (not real-time), but worth noting for future optimization.

  5. setup.pytrimesh is in DATAGEN_VIZ_DEPS but used by core dynamic_object_tracker.py
    As noted in finding #2, trimesh should be moved to RUNTIME_DEPS or the import should be deferred.

  6. PR description is template boilerplate — The "Detailed description" section still contains the default "What was the reason for the change?" prompts without actual content. This makes it harder for reviewers to understand intent and scope.

Test Coverage

The PR includes three test files:

  • test_geometry.py — Good coverage of rotation/quaternion conversions and SE(3) operations
  • test_hdf5_writer.py — Thorough schema round-trip test verifying shapes, dtypes, and metadata
  • test_scene_metadata.py — Validates naming conventions and catalogue completeness (84 scenes)

Gaps: No tests for:

  • SceneFlowComputer logic (the most complex algorithmic code)
  • DynamicObjectTracker motion detection and mesh sampling
  • DatagenCollector integration (on_step/finalize lifecycle)
  • The _classify_pixels_by_tracking reshape correctness (finding #1)

These are understandable given Isaac Sim dependencies, but at minimum the reshape behavior in scene_flow should have a unit test with synthetic data.

Verdict

Minor fixes needed

The architecture and implementation quality are high overall — this is a well-designed, self-contained datagen package. The main concern is the potential W/H transpose in scene flow (finding #1), which should be verified. Finding #2 (trimesh as hard dependency) should be resolved to match the documented install requirements. The remaining items are suggestions for robustness and maintainability.


# Deterministic name-to-key mappings (avoids hash non-determinism)
_name_to_rigid_key: dict[str, int] = {}
_name_to_articulation_key: dict[str, int] = {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Potential transpose bug: unproject_depth(depth_hw, intrinsics_33) returns (H*W, 3) points in raster order (row-major). Reshaping as (W, H, 3) then permuting (1, 0, 2) gives the same result as reshape(H, W, 3) only if unproject_depth internally iterates columns-first. If it iterates rows-first (the standard depth.reshape(-1) order), the current code transposes the point cloud spatially, corrupting all downstream scene flow and tracking.

Please verify against Isaac Lab's unproject_depth implementation. If row-major, this should simply be:

points_W_hw3 = points_W_hw3.reshape(H, W, 3)

from __future__ import annotations

import numpy as np
import torch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 trimesh is imported unconditionally at module top level, but setup.py places it in the optional datagen-viz extras rather than RUNTIME_DEPS. Since DynamicObjectTracker is used by the core pipeline (both standalone and collector modes), either:

  1. Move trimesh to RUNTIME_DEPS, or
  2. Defer the import to sample_dynamic_object_meshes() and _collect_mesh_from_prim() so the tracker can be instantiated without trimesh (mesh sampling would fail gracefully if the extra isn't installed).

@greptile-apps

greptile-apps Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces the isaaclab_arena_datagen package \u2014 a self-contained synthetic data generation pipeline (no nvblox_next / pytorch3d dependency) that writes SyntheticScene-format HDF5 files from Isaac Lab Arena simulations. It supports two collection modes: a standalone scene generator (run_datagen.py) and an opt-in hook in the existing policy-rollout path (DatagenCollector). The eval-runner gains a top-level datagen JSON block to collect per-episode data across job sweeps.

  • New isaaclab_arena_datagen package: pipeline.py / camera_handler.py / scene_flow.py share per-step recording logic between both modes; io/hdf5_writer.py pre-allocates resizable HDF5 datasets with Zstd compression; geometry/ provides a torch-only SE(3) implementation.
  • Policy-runner integration (policy_runner.py / eval_runner.py): a duck-typed collector argument is threaded through rollout_policy; eval_runner builds a per-job DatagenCollector from a JSON datagen block and properly tears it down in a finally chain before the stage is destroyed.
  • 80 new datagen environments with per-scene camera placements registered into the shared EnvironmentRegistry.

Confidence Score: 3/5

The eval_runner integration is correctly guarded with a finally chain, but the standalone policy_runner leaves the collector unclosed and its finalize skipped on any rollout exception.

HDF5 file-handle leak paths from prior review rounds remain unaddressed: collector.finalize in the else block, no collector.close in standalone main(), no try/finally in run_datagen.py, and _end_episode lacks a finally guard. Any rollout exception leaves an open HDF5 file and camera annotators attached when the stage tears down.

isaaclab_arena/evaluation/policy_runner.py and isaaclab_arena_datagen/collection/collector.py and isaaclab_arena_datagen/run_datagen.py

Important Files Changed

Filename Overview
isaaclab_arena/evaluation/eval_runner.py Adds build_datagen_collector factory and wires collector.close(env) into the existing finally chain correctly; env_name is always None so environment's get_default_cameras is bypassed when no camera_position is provided.
isaaclab_arena/evaluation/policy_runner.py Threads collector through rollout_policy correctly; collector.finalize() sits in the else block and collector.close() is never called in main(), both noted in previous reviews.
isaaclab_arena_datagen/collection/collector.py Episode-split logic is correct; _end_episode lacks try/finally so the HDF5 writer leaks on save_dynamic_objects failure (noted in previous reviews).
isaaclab_arena_datagen/pipeline.py Clean shared pipeline; record_camera_step, run_simulation_loop, and save_dynamic_objects are well-factored and shared between both entry points.
isaaclab_arena_datagen/run_datagen.py Standalone entry point; setup.writer.close() has no try/finally so HDF5 file leaks if run_simulation_loop or save_dynamic_objects raises.
isaaclab_arena_datagen/io/hdf5_writer.py Self-contained SyntheticScene HDF5 writer with resizable pre-allocation, Zstd compression, and trim support. Schema matches expected layout.
isaaclab_arena_datagen/camera_handler.py Wraps Isaac Lab Camera with look-at math and USD transform manipulation. Uses private _rep_registry in close() which may break across Isaac Lab versions.
setup.py trimesh placed in optional datagen-viz extra but imported unconditionally by dynamic_object_tracker.py and mesh_utils.py; noted in previous review thread.

Sequence Diagram

sequenceDiagram
    participant ER as eval_runner / policy_runner
    participant RP as rollout_policy
    participant DC as DatagenCollector
    participant HW as DatagenHDF5Writer
    participant PT as pipeline

    ER->>DC: from_env(env, cfg, env_name)
    DC->>PT: build_camera_setups(cameras, w, h, registry)

    loop per env.step()
        RP->>DC: on_step(env, obs, actions, step_idx, done)
        alt "done=True and local>0"
            DC->>HW: trim(local) + close()
            DC->>DC: _start_episode()
        end
        DC->>PT: record_camera_step(...)
        PT->>HW: write_rgb / write_depth / write_normals / ...
    end

    RP->>DC: finalize(env) [else block only]
    DC->>HW: trim(local) + close()

    ER->>DC: close(env) [finally block in eval_runner]
    DC->>DC: cam.handler.close()
Loading

Reviews (5): Last reviewed commit: "Fix flow2d tracking" | Re-trigger Greptile

Comment thread setup.py
Comment on lines 27 to +36

# isaaclab_arena_datagen: optional dependencies for per-scene visualizations
# (PNG/MP4/HTML) and mesh-surface sampling. Imported lazily so the core data
# pipeline runs without them.
DATAGEN_VIZ_DEPS = [
"plotly",
"scipy",
"imageio",
"trimesh",
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 trimesh is a required runtime dependency, not a visualization-only one.

dynamic_object_tracker.py imports trimesh, trimesh.sample, and trimesh.util at the top level (not lazily), and utils/mesh_utils.py does the same. These modules are imported directly by the core pipeline (pipeline.pyDynamicObjectTrackermesh_utils). Any user who runs pip install isaaclab_arena without [datagen-viz] will immediately hit an ImportError the first time the data collection pipeline runs, before any collection begins. trimesh must be promoted to RUNTIME_DEPS.

Comment on lines +113 to +155
from isaaclab_arena.utils.isaaclab_utils.simulation_app import teardown_simulation_app # noqa: E402
from isaaclab_arena_datagen.pipeline import ( # noqa: E402
SimDataCollectionSetup,
run_simulation_loop,
save_dynamic_objects,
)


def main() -> None:
"""Run the full data generation pipeline: setup, simulate, save, visualize."""
args = _ARGS

setup = SimDataCollectionSetup.from_config(
scene_name=args.scene_name,
output_dir=args.output_dir,
num_steps=args.num_steps,
width=args.width,
height=args.height,
)

run_simulation_loop(
setup.env,
setup.camera_setups,
setup.writer,
setup.dynamic_tracker,
args.num_steps,
)

save_dynamic_objects(
setup.env,
setup.writer,
setup.dynamic_tracker,
args.dynamic_translation_eps,
args.dynamic_rotation_eps,
args.mesh_sample_spacing,
)

setup.writer.close()

if args.visualizations:
from isaaclab_arena_datagen.visualizer import DatagenVisualizer

hdf5_path = os.path.join(args.output_dir, "dataset.h5")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 HDF5 file is left open when an exception propagates from run_simulation_loop or save_dynamic_objects.

DatagenHDF5Writer already implements __enter__/__exit__, but main() calls .close() at the end of a linear sequence with no try/finally guard. If any of the preceding calls raise — a common path, e.g. a sensor rendering error or a USD query failure — the HDF5 file handle is leaked and the file may be corrupt. Use the writer as a context manager or add a try/finally so setup.writer.close() is always executed.

Comment on lines +195 to +206
if self._finalized:
return
self._finalized = True
save_dynamic_objects(
env if env is not None else self._last_env,
self._writer,
self._dynamic_tracker,
self._cfg.dynamic_translation_eps,
self._cfg.dynamic_rotation_eps,
self._cfg.mesh_sample_spacing,
)
self._writer.close()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 HDF5 file is left open when save_dynamic_objects raises inside finalize.

self._finalized is set to True before calling save_dynamic_objects, so if that call raises (e.g. a USD query or mesh-sampling failure), the method becomes permanently idempotent and self._writer.close() is never reached on any subsequent call. The writer should be closed in a finally block so the file is always properly flushed and closed regardless of whether the pose/mesh serialization succeeds.

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Feature/nvblox_next_datagen

This is a well-architected addition of synthetic data generation capabilities to IsaacLab Arena. The PR introduces a new isaaclab_arena_datagen module (~7,400 lines across 43 files) that generates SyntheticScene-format HDF5 data for training computer vision models.

🎯 Architecture & Design (Strengths)

  1. Clean Separation of Concerns: The two-mode design (standalone run_datagen.py + policy-rollout collection via DatagenCollector) sharing pipeline.py ensures data consistency across use cases.

  2. No External Dependencies on nvblox_next/pytorch3d: Self-contained SE(3) geometry implementation (geometry/) is a good design choice for portability.

  3. Duck-typed Collector Interface: The collector.on_step()/finalize() pattern allows opt-in data collection without modifying core rollout logic.

  4. Proper Pre-allocation: DatagenHDF5Writer pre-allocates all datasets upfront, which is essential for HDF5 performance with chunked writes.


⚠️ Issues Requiring Attention

1. Single Environment Limitation Not Enforced (High Priority)

The code has multiple assert shape[0] == 1, "Expected single environment" checks but no upfront validation in DatagenCollector.from_env():

# isaaclab_arena_datagen/collection/collector.py, line ~122
@classmethod
def from_env(cls, env, cfg, num_steps, env_name=None):
    assert num_steps is not None and num_steps > 1, ...
    # Missing: assert env.unwrapped.num_envs == 1

Suggestion: Add explicit validation at initialization rather than failing mid-recording with cryptic shape assertion errors.

2. Potential Silent Data Loss in Episode Mode

In --num_episodes mode, the pre-allocation size is num_episodes * max_episode_length, but early-finishing episodes leave trailing frames as zeros:

# isaaclab_arena/evaluation/policy_runner.py, line ~72
datagen_horizon = num_episodes * env.unwrapped.max_episode_length

Concern: The HDF5 file stores valid zeros indistinguishable from invalid trailing frames. Consider adding a valid_frames attribute or sentinel values.

3. Mesh Sampling Memory Concerns (Medium Priority)

The MAX_MESH_SAMPLE_POINTS cap (presumably ~10000) with farthest point sampling has O(n*k) complexity:

# isaaclab_arena_datagen/dynamic_object_tracker.py
num_points = min(num_points_raw, MAX_MESH_SAMPLE_POINTS)
# ... _farthest_point_sampling has O(n*k) complexity

This could still be slow for large meshes. Consider adding a timing warning or using a more efficient algorithm (e.g., voxel grid downsampling).

4. Exception Handling in Visualizer Could Mask Issues

# isaaclab_arena_datagen/visualizer.py
except QhullError:
    continue  # Silently skips this mesh

While reasonable for visualization robustness, logging would help debugging.


📝 Test Coverage Analysis

Good Coverage:

  • test_geometry.py: Round-trip tests for quaternion/matrix conversions
  • test_hdf5_writer.py: Schema validation, shapes, dtypes, attributes

Missing Coverage:

  1. No integration tests for the end-to-end pipeline (run_simulation_loop → HDF5 output)
  2. No tests for DynamicObjectTracker - particularly the motion detection thresholds
  3. No tests for camera repositioning (set_world_pose)
  4. No tests for scene flow computation (SceneFlowComputer)

Suggestion: Add at least mock-based tests for record_camera_step to verify the data flow without Isaac Sim.


🔧 Minor Suggestions

  1. Type Hints Consistency: Some functions use Any extensively (e.g., env: Any) where more specific types would help:

    # Consider defining a protocol or type alias
    from typing import TYPE_CHECKING
    if TYPE_CHECKING:
        from isaaclab.envs import ManagerBasedRLEnv
  2. Documentation Gap: The scene_metadata.py module is referenced but the PR description doesn't explain the scene naming conventions clearly.

  3. setup.py Extra Not Declared: The datagen-viz extra mentioned in README isn't visible in the setup.py diff. Consider adding:

    extras_require={
        "datagen-viz": DATAGEN_VIZ_DEPS,
    }
  4. Camera ID Generation: camera_id_from_index() is imported but not shown. Ensure it generates deterministic IDs for reproducibility.


✅ Summary

Overall, this is a solid implementation with good architectural decisions. The main concerns are:

  1. Must fix: Add explicit num_envs == 1 validation in DatagenCollector.from_env()
  2. Should address: Document or handle trailing invalid frames in episode mode
  3. Consider: Add timing/memory warnings for large mesh sampling operations

The code quality is high with consistent docstrings and clear module organization. Ready for merge after addressing the validation issues.


Incremental Review Update (e7c8ed1):

The new commits (5 files changed) add CLI camera look-at overrides and improve documentation. Summary of changes:

  1. policy_runner.py: Relaxed --datagen-camera-position to work without requiring --datagen-camera-target (defaults to world origin). Clean and backwards-compatible.

  2. camera_trajectory.py: Minor docstring improvements clarifying the look-at semantics. No logic changes.

  3. pipeline.py: Added a clarifying comment to the dynamic-camera re-pose check. No logic changes.

  4. README.md: Expanded camera viewpoint documentation with priority ordering and CLI examples. Good improvement for usability.

  5. tests/test_camera_trajectory.py (new): Pure-logic unit tests for CameraViewTrajectory — validates static/dynamic cameras, trajectory length mismatch, and focal length constraints. Welcome addition.

Verdict: Clean incremental improvement. No new issues found. ✅

Previous findings status:

  • ⚠️ W/H transpose in scene_flow.pynot addressed in this push
  • ⚠️ trimesh hard import vs. optional dependency — not addressed in this push

These remain open from the initial review.

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental Review: Commit e7c8ed1 — Add camera focal length

Following up on the previous comprehensive review, this commit introduces minor but useful improvements to camera viewpoint handling:

Changes Reviewed

File Changes
policy_runner.py CLI camera look-at now defaults --datagen-camera-target to world origin when only position is given. Clean backward-compatible change.
camera_trajectory.py Docstring clarifications for look-at semantics. No logic changes.
pipeline.py Added clarifying comment for dynamic camera re-pose check. No logic changes.
README.md Expanded camera viewpoint documentation with CLI examples and priority ordering. Good UX improvement.
test_camera_trajectory.py (new) Pure-logic unit tests for static/dynamic cameras, trajectory validation, and focal length constraints. Welcome addition to test coverage.

Analysis

Positive Changes:

  • ✅ The CLI improvement allows --datagen-camera-position 1.0 0.0 1.5 without requiring --datagen-camera-target — sensible default behavior
  • ✅ New tests (test_camera_trajectory.py) cover edge cases like zero focal length and trajectory length mismatches
  • ✅ Documentation improvements make the camera priority order explicit

No New Issues Found — this is a clean incremental improvement.

Outstanding Items from Initial Review

The following items from the comprehensive review remain unaddressed:

  1. ⚠️ Single environment validation — Still missing explicit num_envs == 1 check in DatagenCollector.from_env()
  2. ⚠️ W/H transpose concern in scene_flow.py — The unprojection reshape uses .reshape(W, H, 3).permute(1, 0, 2) which correctly handles the row-major ordering, but deserves a clarifying comment
  3. ⚠️ trimesh hard importmesh_utils.py imports trimesh at module level but it's only listed in optional datagen-viz extras

These can be addressed in follow-up commits.


Status: Ready for merge after addressing the single-env validation concern (or documenting that it's the caller's responsibility).


Incremental Review: Commit bae8255 — Per-episode HDF5 splitting & eval-runner datagen integration

This is a major architecture change that refactors the datagen collector from a single-file-per-rollout model to one-HDF5-file-per-episode, and integrates datagen collection into eval_runner.

Changes Reviewed (10 files)

File Summary
collection/collector.py Complete rewrite: removed fixed-horizon pre-allocation; collector now manages per-episode lifecycle (_start_episode / _end_episode), writer/tracker are per-episode, on_step accepts done flag to trigger splits, new close() method for camera cleanup
eval_runner.py New build_datagen_collector() factory; integrates collector into the job loop with proper lifecycle (close before stage teardown)
job_manager.py Job gains optional datagen dict field
policy_runner.py Removes old datagen_horizon calculation; on_step now passes done flag; from_env no longer requires num_steps
camera_handler.py New reset_scene_flow() and close() methods for proper cleanup between episodes/jobs
scene_flow.py New reset() method clearing _prev frame cache
io/hdf5_writer.py Datasets now resizable (maxshape); new trim() method; configurable filename param
dynamic_object_tracker.py New trim() method to shrink pose buffers
utils/constants.py DEFAULT_TRANSLATION_EPS_M bumped from 1e-42e-4
tests/test_hdf5_writer.py New test for trim() functionality
New eval job configs (×2) Example configs exercising the new datagen top-level block

Analysis

Positive Changes:

  • ✅ Per-episode files are a much better design: no wasted frames from early-terminating episodes, each file is self-contained and correctly trimmed
  • ✅ Scene-flow reset at episode boundaries prevents spurious cross-reset flow vectors — good catch
  • camera_handler.close() properly detaches replicator annotators and removes prims — fixes a real resource leak between eval-runner jobs
  • ✅ Nested finally blocks in eval_runner.py ensure collector closes before env teardown (correct ordering for annotator lifecycle)
  • trim() + resizable datasets is clean; new test validates it
  • ✅ The done-flag semantic ("frame on a done step belongs to the next episode because Isaac Lab resets within step()") is correctly documented and handled

Observations / Minor Items:

  1. 📝 _CAPACITY_MARGIN = 2 — The +2 guard over max_episode_length is fine for safety, but it might be worth documenting why 2 specifically (off-by-one + the done-frame belonging to next episode?).

  2. 📝 DEFAULT_TRANSLATION_EPS_M bump — Doubling the threshold from 0.1mm to 0.2mm will filter out more objects as "static". If this was tuned empirically that's fine, but a brief comment on the motivation (e.g. sim jitter) would help future readers.

  3. 📝 Camera close() uses private APIs (camera._rep_registry, camera._render_product_paths) — This mirrors Isaac Lab's own Camera.__del__ as noted in the docstring, which is the right approach, but it's fragile across Isaac Lab versions. The try/except makes it safe. Acceptable.

Outstanding Items (carried forward)

  1. ⚠️ W/H transpose in scene_flow.py line 99.reshape(W, H, 3).permute(1, 0, 2) is still present without a clarifying comment. Not addressed.
  2. ⚠️ trimesh hard import in dynamic_object_tracker.py — Still a top-level import (import trimesh) despite being an optional dependency. Not addressed.
  3. ⚠️ Single-env validation — No explicit num_envs == 1 assertion in from_env() (the docstring mentions the requirement but doesn't enforce it).

Verdict

Well-structured change. The per-episode approach is architecturally sound and the cleanup lifecycle is correct. No new blocking issues. The carried-forward items are pre-existing and non-critical. 👍

Comment on lines +219 to +224

# The reset for a done step has already happened, so the current frame is
# the first frame of a new episode: flush what we have, then open a new file.
if done and self._local > 0:
self._end_episode(env)
self._start_episode()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Exception from _end_episode in on_step aborts the entire rollout

When done=True, _end_episode is called synchronously inside on_step. If save_dynamic_objects or mesh sampling fails here (e.g., a USD query returns no geometry for a newly-spawned object), the exception propagates to rollout_policy's except block, which re-wraps it as RuntimeError("Error rolling out policy: ...") and kills the entire job — even though the policy itself was operating correctly.

collector.close() intentionally suppresses these same errors with a try/except and a warning print, but the identical code path inside on_step has no such guard. A data-collection failure at any episode boundary therefore masquerades as a policy rollout failure, causing the job to be marked FAILED in eval_runner.py.

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Incremental Update (380a4fe)

New commit: Fix flow2d tracking by @DavidRecasens


Summary of New Changes

This commit introduces a significant refactor of how segmentation metadata is parsed and used to resolve tracking bindings:

  1. New segmentation_utils.py module (116 lines) — Centralizes helpers for parsing Replicator segmentation metadata:

    • get_label_for_instance_id() handles both int and string keys
    • label_to_tracking_candidates() extracts prim paths and semantic names with path priority over coarse class labels
    • find_body_index_for_prim() with improved matching using normalized tokens and ranked scoring
  2. Improved tracking binding resolution_resolve_tracking_binding() now:

    • Takes a full label object instead of just prim_path
    • Tries multiple candidates from the label metadata in priority order
    • Returns ("UNSUPPORTED", prim_path) for ambiguous articulations that match an asset but can't resolve to a specific body
  3. Optical flow fallback — In pipeline.py, when pixels are marked as unsupported in the scene flow result, they now fall back to the raw annotator optical flow rather than leaving them with computed (incorrect) values.

  4. Good test coverage — New test_segmentation_utils.py covers string/int key handling, path-over-class priority, nested visual mesh resolution, and the case where coarse labels should NOT guess a body index.


🟢 Positive Changes

1. Robust label handling: The new utils module correctly handles the variety of label formats that Replicator can produce (dict with class/primPath/etc., nested lists, string keys). This fixes edge cases where tracking was failing due to format assumptions.

2. Path priority over coarse class: The ordering in label_to_tracking_candidates() — paths first, then class — is the correct design. This ensures /World/envs/env_0/Robot/panda_link4/visuals takes precedence over a coarse "robot" semantic label.

3. Graceful degradation: The UNSUPPORTED binding type + optical flow fallback is a sound design for handling articulation pixels that can't be tracked at per-body granularity. Better to use sensor optical flow than produce garbage scene flow.

4. Normalized body matching: The _normalize_token() + _body_match_rank() approach handles variations like panda_link4 vs Panda_Link4 and suffix matches like right_inner_finger_collision_mesh.


🔵 Minor Observations (Non-blocking)

1. Score tuple comment: The score = (depth, len(normalized_body), match_rank) in find_body_index_for_prim would benefit from a brief comment explaining the priority order (deeper path wins, then longer body name, then match quality).

2. Lazy import pattern: The functions in camera_handler.py now do inline imports like from isaaclab_arena_datagen.segmentation_utils import .... This is fine for avoiding circular imports, but if these functions are hot (called per-pixel-instance), the import cost may accumulate. In practice this is likely negligible since Python caches imports.


Status of Previous Review Items

# Issue Status
1 trimesh imported unconditionally Still open — not addressed in this commit
2 Broad exception handling in close() Still open — no change to logging
3 No bounds check feedback when _local >= _capacity Still open
4 Device mismatch risk in _classify_pixels_by_tracking Still relevant
5 Missing validation of camera_position format Still open
6 collector.finalize(env) placement asymmetry Still open
7 O(num_objects × H × W) complexity in HDF5 writer Still open

Verdict

Good incremental progress. The flow2d tracking fix addresses a real functional issue with proper architecture (centralized utils, fallback logic, tests). The previous findings still apply and should be addressed before merge.

Reviewed at commit 380a4fe

Signed-off-by: DavidRecasens <drecasens@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants