Skip to content

feat(actuators): add GRU and GRU-residual network actuator models#6083

Draft
vidurv-nvidia wants to merge 3 commits into
isaac-sim:developfrom
vidurv-nvidia:vidurv/gru-actuators
Draft

feat(actuators): add GRU and GRU-residual network actuator models#6083
vidurv-nvidia wants to merge 3 commits into
isaac-sim:developfrom
vidurv-nvidia:vidurv/gru-actuators

Conversation

@vidurv-nvidia

@vidurv-nvidia vidurv-nvidia commented Jun 9, 2026

Copy link
Copy Markdown

Description

Adds two recurrent (GRU) actuator network models to isaaclab.actuators, alongside the existing MLP and LSTM models:

  • ActuatorNetGRU — explicit actuator; a GRU predicts the total joint effort from the joint position error and velocity. Derives from IdealPDActuator, so the predicted effort is clipped to the actuator's effort limit.
  • ActuatorNetGRUResidual — implicit-PD actuator that adds a GRU-predicted residual feed-forward effort. The physics engine applies the PD term using the configured stiffness/damping; the desired positions/velocities are preserved on return.

Both load a TorchScript network that exposes a .gru submodule and implements forward(x, hidden) -> (output, hidden) with a 2-dim input [position_error, velocity] (mirroring ActuatorNetLSTM). Each input and the output support optional (mean, std) normalization via config fields (None = identity). A non-finite network output is sanitized to zero effort before it reaches the engine.

These run on the standard (torch) actuator path. The Newton native-actuator path (use_newton_actuators=True) does not yet provide a GRU controller, so these configs are not authored onto that path; adding a Newton-side GRU controller would be a separate change.

Fixes # (no associated issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Screenshots

N/A — no visual change.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package (do not edit CHANGELOG.rst or bump extension.toml — CI handles that)
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Add ActuatorNetGRU, an explicit actuator whose GRU network predicts the
total joint effort, and ActuatorNetGRUResidual, an implicit-PD actuator
that adds a GRU-predicted residual feed-forward effort. Both support a
configurable set of input columns (position, position_error, velocity,
solver_pd) and optional per-column input normalization and output
denormalization carried directly on the config.

ActuatorNetGRU derives from IdealPDActuator: it predicts the total
effort, so the simple symmetric effort-limit clamp applies and the
velocity-dependent DC-motor torque-speed parameters are not required.
ActuatorNetGRUResidual derives from ImplicitActuator and injects the
residual as feed-forward effort while preserving the desired joint
positions and velocities for the engine-side PD term; effort limits are
enforced by the simulation via effort_limit_sim, as for any implicit
actuator.

Add unit tests, an API documentation entry, and a changelog fragment.
@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Jun 9, 2026

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot — PR #6083

feat(actuators): add GRU and GRU-residual network actuator models

Summary

Well-structured PR that adds GRU and GRU-residual actuator models following the established LSTM pattern. The shared _GRUActuatorMixin cleanly factors the common GRU logic, and the test suite (530 lines, 74 tests) is comprehensive. CI passes on pre-commit and builds; installation tests pending.

Findings


Suggestion — ActuatorNetGRU.reset() should call super().reset(env_ids) for future-proofing Fixed in 63903b9.


Warning — _resolve_normalization silently floors std but doesn't warn on suspiciously small values Fixed in 63903b9 — proper logger.warning() with field identification added.


🔵 Suggestion — Test coverage: consider adding a multi-step partial-reset interleaving test

The existing test_actuator_net_gru_reset validates that resetting env 0 zeros its hidden state while env 1's state persists. A stronger test would: (1) step both envs for N steps, (2) reset env 0, (3) step both envs again, (4) verify env 0's output matches a fresh-start trajectory while env 1's output matches the continued trajectory. This would catch subtle bugs where the view-based reshape accidentally aliases memory incorrectly across batch dimensions.


🟡 Warning — ActuatorNetGRUResidual.compute()control_action.joint_positions must not be None

In ActuatorNetGRUResidual.compute(), after _predict_gru_effort (which validates joint_positions is not None), the method proceeds to compute error_pos = control_action.joint_positions - joint_pos. This is safe because _predict_gru_effort raises if positions are None. However, if the residual variant is ever refactored to skip calling _predict_gru_effort first (e.g., a conditional path), this would NPE. A defensive comment or assertion documenting this invariant would help maintainability.


🔵 Suggestion — Minor: docstring for output_normalization uses reversed y * std + mean vs standard (y - mean) / std

The config docstring correctly states the output is denormalized as y * std + mean (inverse of z-score normalization). This is intentional and correct — just noting that users familiar with standard normalization (x - mean) / std might initially misread the output direction. The docstring is already clear; no action needed.


Verdict

Looks good overall. The implementation cleanly follows the established LSTM actuator pattern, the mixin factoring is sensible, error handling (non-finite sanitization, input validation) is solid, and test coverage is thorough. The remaining findings are non-blocking suggestions.


🔄 Update (2025-06-09)

Reviewed incremental changes (1f1082563903b9). Two previously flagged items were addressed:

  • super().reset(env_ids) now called in ActuatorNetGRU.reset()
  • _resolve_normalization now emits logger.warning() with field name when std is below floor

Additional changes: CONTRIBUTORS.md updated (author added). No new issues introduced. LGTM.

Operations.
"""

def reset(self, env_ids: Sequence[int]):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Suggestion: Consider calling super().reset(env_ids) here for consistency with ActuatorNetGRUResidual.reset() and to future-proof against base class changes.

Suggested change
def reset(self, env_ids: Sequence[int]):
def reset(self, env_ids: Sequence[int]):
super().reset(env_ids)
self._reset_gru_state(env_ids)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 63903b9ActuatorNetGRU.reset() now calls super().reset(env_ids) before resetting the hidden state, matching ActuatorNetGRUResidual.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! 👍


def _resolve_normalization(self, stats: tuple[float, float] | None) -> tuple[float, float]:
"""Return the ``(mean, std)`` to apply, defaulting to identity and flooring the std."""
if stats is None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Warning: If std is passed as 0.0 (e.g., from a misconfigured normalization config), this silently floors to 1e-8, which can make the normalized inputs explode to ~O(1e8). Consider logging a warning when the input std is below the floor.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 63903b9_resolve_normalization now logs a warning when a configured std is below the floor (and identifies which normalization field), since a near-zero std amplifies the normalized values.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the warning with field identification is exactly what I had in mind. LGTM ✅

- Call super().reset() in ActuatorNetGRU.reset for consistency with the
  residual variant and to future-proof against base-class changes.
- Warn when a configured normalization std is floored, since a near-zero
  std can amplify the normalized values.
- Add Vidur Vij to CONTRIBUTORS.
@vidurv-nvidia vidurv-nvidia marked this pull request as ready for review June 9, 2026 19:54
@vidurv-nvidia vidurv-nvidia marked this pull request as draft June 9, 2026 19:55
@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds two GRU-based actuator models — ActuatorNetGRU (explicit, total-torque) and ActuatorNetGRUResidual (implicit-PD with a GRU residual) — alongside the existing MLP and LSTM models. Both are implemented via a shared _GRUActuatorMixin that loads a TorchScript network, validates its shape, manages the recurrent hidden state per environment, and supports optional (mean, std) normalization on each of the three inputs (position, position error, velocity) and the output.

  • The mixin faithfully mirrors the ActuatorNetLSTM pattern (input assembly outside inference_mode, hidden-state copy via in-place slice assignment inside the block) and adds normalization, NaN sanitization, and a negative-std guard that the LSTM lacks.
  • ActuatorNetGRUCfg derives from IdealPDActuatorCfg with stiffness/damping nulled out; ActuatorNetGRUResidualCfg derives from ImplicitActuatorCfg and preserves PD gains so the physics engine handles the PD term while the GRU contributes a residual feed-forward effort.
  • The test file provides comprehensive coverage: both variants, multi-layer GRU, effort clipping, NaN sanitization, input/output normalization math, selective env reset, error paths, and a production-sized 2-layer export architecture.

Confidence Score: 5/5

Safe to merge — the new GRU actuator path is self-contained, fully tested, and does not modify any existing code paths.

All three code paths (input assembly, GRU inference, hidden-state reset) are additive and isolated from the existing LSTM and MLP actuators. The mixin design avoids touching any shared state, the normalization guard correctly raises on negative std and floors near-zero values, and the NaN sanitization prevents non-finite network output from reaching the physics engine. Tests cover the full surface area including edge cases.

No files require special attention; the single note in actuator_net.py is a warning-message wording improvement.

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/actuators/actuator_net.py Adds _GRUActuatorMixin, ActuatorNetGRU, and ActuatorNetGRUResidual; mirrors LSTM pattern with a 3-input GRU and optional normalization for each channel; one misleading warning message for near-zero output_normalization std.
source/isaaclab/isaaclab/actuators/actuator_net_cfg.py Adds ActuatorNetGRUCfg and ActuatorNetGRUResidualCfg; normalization fields and network_file are clearly documented.
source/isaaclab/test/actuators/test_actuator_net_gru.py Comprehensive test suite covering both actuator variants, multi-layer GRU, effort clipping, NaN sanitization, input/output normalization, selective env reset, and error validation.
source/isaaclab/isaaclab/actuators/init.pyi Adds GRU classes and configs to all and public imports; no issues.
docs/source/api/lab/isaaclab.actuators.rst Adds GRU actuator entries to the API reference in the same style as the existing LSTM section.

Sequence Diagram

sequenceDiagram
    participant Sim as Simulation Loop
    participant GRU as ActuatorNetGRU / ActuatorNetGRUResidual
    participant Mix as _GRUActuatorMixin
    participant Net as TorchScript GRU Network

    Sim->>GRU: compute(control_action, joint_pos, joint_vel)
    GRU->>Mix: _predict_gru_effort(...)
    Mix->>Mix: assemble sea_input [position, pos_error, velocity]
    Mix->>Mix: apply input normalization (outside inference_mode)
    Mix->>Net: forward(sea_input, sea_hidden_state) [inference_mode]
    Net-->>Mix: (output, new_hidden_state)
    Mix->>Mix: denormalize output and nan_to_num
    Mix-->>GRU: effort [num_envs, num_joints]

    alt ActuatorNetGRU (explicit)
        GRU->>GRU: clip effort via _clip_effort
        GRU->>Sim: "joint_efforts set, joint_positions/velocities = None"
    else ActuatorNetGRUResidual (implicit-PD)
        GRU->>GRU: accumulate residual into joint_efforts
        GRU->>GRU: compute approximate total effort for telemetry
        GRU->>Sim: joint_efforts augmented, joint_positions/velocities preserved
    end

    Sim->>GRU: reset(env_ids)
    GRU->>Mix: _reset_gru_state(env_ids)
    Mix->>Mix: "sea_hidden_state_per_env[:, env_ids] = 0"
Loading

Reviews (2): Last reviewed commit: "Use fixed 3-input GRU contract and harde..." | Re-trigger Greptile

Comment on lines +199 to +221

Args:
control_action: The joint action instance holding the desired joint positions.
joint_pos: The current joint positions. Shape is (num_envs, num_joints).
joint_vel: The current joint velocities. Shape is (num_envs, num_joints).

Returns:
The predicted effort [N·m or N, depending on joint type]. Shape is
(num_envs, num_joints).

Raises:
ValueError: If ``control_action.joint_positions`` is None.
"""
if control_action.joint_positions is None:
raise ValueError("GRU actuator input requires control_action.joint_positions to be set.")
# normalized [position_error, velocity] inputs
pos_error = (control_action.joint_positions - joint_pos).flatten()
self.sea_input[:, 0, 0] = (pos_error - self._pos_error_norm[0]) / self._pos_error_norm[1]
self.sea_input[:, 0, 1] = (joint_vel.flatten() - self._vel_norm[0]) / self._vel_norm[1]

# run inference, then denormalize and guard against a non-finite output
with torch.inference_mode():
output, self.sea_hidden_state[:] = self.network(self.sea_input, self.sea_hidden_state)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Negative std silently corrected to 1e-8 instead of raising an error

max(float(std), self._GRU_STD_FLOOR) will replace any negative std (e.g., -2.0) with 1e-8 — effectively dividing by nearly zero — while the warning message says "flooring it, which can amplify the normalized values". A user who accidentally swaps the (mean, std) tuple order (passing (std, mean)) or passes a negative value gets input amplification of up to 1 / 1e-8 = 1e8 with a warning that does not clearly signal the sign correction. Only positive-but-small std values belong in the "floor" category; a negative std should raise ValueError to force the misconfiguration to be fixed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e1dea0f_resolve_normalization now raises ValueError for a negative std (only 0 ≤ std < 1e-8 is floored, with the warning). Added a regression test (test_actuator_net_gru_negative_std_raises).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent — raising on negative std while flooring the near-zero case is the right split. Regression test seals it. 👍

Comment on lines +256 to +261
"""

def reset(self, env_ids: Sequence[int]):
super().reset(env_ids)
self._reset_gru_state(env_ids)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Denormalization applied to an inference-mode tensor outside the context block

output returned by the network inside torch.inference_mode() is an inference tensor. The denormalization output * self._output_norm[1] + self._output_norm[0] and the subsequent nan_to_num are executed outside the with block, so they run with gradient tracking enabled (if a grad context is active). In most deployment scenarios this is harmless, but it creates unnecessary ops when called from within a torch.no_grad() or outer inference_mode() scope. Consider moving the denormalization and sanitization inside the with torch.inference_mode(): block for consistency and to avoid any surprise interactions with autograd.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e1dea0f — the denormalization and nan_to_num now run inside the with torch.inference_mode() block.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Confirmed — having everything under the inference_mode() context is cleaner and correct. 👍

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix! Verified that the denormalization and nan_to_num are now properly scoped inside the inference_mode() block. 👍

- Use a fixed [position, position_error, velocity] network input (3 inputs)
  to match the trained actuator-model feature set, instead of the
  2-input [position_error, velocity] contract; add position_normalization.
- Raise on a negative normalization std and reject bidirectional GRU
  networks instead of silently mis-handling them.
- Run output denormalization and the non-finite guard inside inference_mode.
- Strengthen tests: check the explicit actuator against an independent
  reference forward and cover the negative-std error path.
@AntoineRichard

Copy link
Copy Markdown
Collaborator

Hi @vidurv-nvidia could you target newton's actuators? We will be deprecating our internal actuators, I think it would make sense to use the shared actuators they built this way IsaacSim and Newton could use them too. Or at least could these actuators modify the newton actuators rather than our own? @jvonmuralt for viz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants