Add script for e2e agentic scene gen#778
Conversation
mkdir(parents=True, exist_ok=True) on the output path's parent so a first-time run on a fresh checkout doesn't crash with FileNotFoundError when the llm_generated/ directory doesn't exist yet. Signed-off-by: Qian Lin <qianl@nvidia.com>
Replace all hard-coded [:10] / [:5] slices in TraceEvent candidate lists with a single named constant to make the limit explicit and easy to change in one place. Signed-off-by: Qian Lin <qianl@nvidia.com>
Replace frozenset with a plain set literal and add an inline comment to each entry explaining when that error stage fires. Signed-off-by: Qian Lin <qianl@nvidia.com>
Replace the Attributes block in TraceEvent with per-field docstrings, and move the numbered resolution strategy steps from resolve_item's docstring into inline comments at each corresponding code branch. Signed-off-by: Qian Lin <qianl@nvidia.com>
Consistent with background and item nodes, set id=query on the
embodiment node so task params that reference the robot by the
agent-emitted name (e.g. "franka") resolve correctly against known_ids,
rather than silently missing when the resolved name differs ("franka_ik").
Signed-off-by: Qian Lin <qianl@nvidia.com>
When two items share the same query and neither has an instance_name they produce the same node ID silently. Mark the spot for a follow-up. Signed-off-by: Qian Lin <qianl@nvidia.com>
Signed-off-by: Qian Lin <qianl@nvidia.com>
Now that resolve_embodiment returns None on miss (no silent fallback), "embodiment.miss" is an error stage. Split the old combined test into two: one verifying tag-pool relaxation is non-fatal, and one asserting an unknown embodiment is correctly reported as a resolution error. Signed-off-by: Qian Lin <qianl@nvidia.com>
Move graph and trace printing into helpers so main() stays readable, and sanitize env_name when writing proposal YAML.
Add _agent_node_id so background, embodiment, and item resolvers share one rule: graph node ids match agent-emitted reference strings.
Align relation resolution trace events with SpatialRelationSpec.reference.
IntentCompiler.compile() produces UnresolvedArenaEnvGraphSpec; AssetMatcher handles catalog binding. Keeps UnresolvedArenaEnvGraphSpec.resolve() unchanged.
Route item, background, and embodiment matching through one resolve_name API with required/preferred tags and consistent _best_match ordering. IntentCompiler calls it directly.
IntentCompiler passes item, background, or embodiment so trace stages stay decoupled from required_tags like object.
There was a problem hiding this comment.
Code Review: Add script for e2e agentic scene gen
Nice work integrating the full agentic environment generation pipeline! The end-to-end runner is well-structured, and the fixes for anchor relations, dome lighting, and reasoning_content handling address real issues. A few observations below:
🔴 High Priority
1. Overly broad exception catch in retry loop (environment_generation_agent.py)
except (json.JSONDecodeError, Exception) as exc:
last_exc = exc(json.JSONDecodeError, Exception) is equivalent to just except Exception since every exception is an Exception. This means non-retryable errors (e.g. pydantic.ValidationError indicating a genuine schema mismatch, TypeError, or even KeyboardInterrupt propagating) will be silently swallowed and retried. Consider narrowing to:
except (json.JSONDecodeError, pydantic.ValidationError) as exc:
last_exc = excThis way, truly unexpected errors still bubble up immediately rather than wasting retry budget.
2. assert statements are not retried — inconsistent error handling
The assert choices and assert route != "empty" statements inside the retry loop raise AssertionError, which is not caught by the except block. This means:
- Empty choices (content filter / guardrail) → immediate crash (no retry)
- Malformed JSON → retried
If the intent is to also retry on empty/filtered responses (which are often transient), replace the asserts with explicit checks that raise a retriable exception:
if not choices:
raise RuntimeError("No choices returned — content filter or rate limit")And catch RuntimeError in the retry handler.
🟡 Medium Priority
3. Pose.identity() only applied for is_anchor — potential ordering concern
In arena_env_graph_conversion_utils.py:
if spatial_constraint.kind == "is_anchor" and subject_asset.get_initial_pose() is None:
subject_asset.set_initial_pose(Pose.identity())This works when spatial constraints are processed in a single pass, but if the same asset appears in multiple constraints and a later constraint sets a pose, it will overwrite the identity pose. The logic is fine today, but a comment noting that identity is intentionally a fallback-only default (and won't clobber an explicit pose) would prevent future confusion.
4. AssetRegistry() instantiation in fallback path
light_cls = AssetRegistry().get_asset_by_name("light")
scene_assets.append(light_cls())If AssetRegistry is not a singleton, this creates a fresh registry every time the fallback fires. If it is a singleton, a brief comment confirming that would help readers. Also: get_asset_by_name("light") could return None if the registry doesn't have a "light" entry — a guard/assertion would prevent a confusing TypeError: 'NoneType' object is not callable.
5. Runner has no graceful cleanup on mid-pipeline failure
If the runner fails between environment construction (Step 5) and env.close() (end of Step 6), the Isaac Sim app may leak GPU resources. A try/finally around the env lifecycle would be more robust:
env = builder.make_registered()
try:
# ... run policy ...
finally:
env.close()🟢 Low Priority / Suggestions
6. _safe_stem could produce collisions for different prompts
The regex re.sub(r"[^\w.-]+", "_", name) collapses distinct env names into the same stem (e.g. "pick & place" and "pick / place" both → "pick_place"). Adding a short hash suffix (e.g. first 6 chars of the prompt's MD5) would prevent silent overwrites in --out-dir.
7. No tests for the new runner or the retry logic
The new environment_generation_runner.py (217 lines) and the retry mechanism in generate_spec don't have corresponding tests. At minimum, a unit test for the retry path (mocking client.chat.completions.create to return malformed JSON then valid JSON) would prevent regressions. The extract_response_text preference change (JSON-prefixed reasoning_content) also warrants a test given it affects all DeepSeek calls.
8. System prompt reformatting — functional change buried in style commit
The system prompt now includes two new behavioral rules (is_anchor requirement, lighting auto-injection). These are meaningful semantic changes but the commit message says "Reformat system prompt for better readability". Consider documenting these behavioral additions in the PR description or a separate commit for traceability.
Summary
The core logic is sound — the retry mechanism, dome light injection, and anchor handling all address real failure modes. The main concern is the exception handling in the retry loop (items 1-2): as written, it catches too broadly while simultaneously not catching the assertion failures it shares scope with. A focused fix there would make the retry behavior predictable and robust.
Summary
Add environment_generation_runner for running the agentic environment generation end to end
Detailed description
What was the reason for the change?
We need an entry point to run the full agentic env gen pipeline for integration testing.
What has been changed?
A new script is added to run the agentic env gen. It does the following steps:
What is the impact of this change?
To run:
python -m isaaclab_arena.agentic_environment_generation.environment_generation_runner --num_envs 16 --env_spacing 1.5 --num-steps 2000 --viz kit --prompt "franka pick up avocado from the maple table and place it into a bowl on the table. there are other veggies on the table as distractor."