feat(actuators): add GRU and GRU-residual network actuator models#6083
feat(actuators): add GRU and GRU-residual network actuator models#6083vidurv-nvidia wants to merge 3 commits into
Conversation
Add ActuatorNetGRU, an explicit actuator whose GRU network predicts the total joint effort, and ActuatorNetGRUResidual, an implicit-PD actuator that adds a GRU-predicted residual feed-forward effort. Both support a configurable set of input columns (position, position_error, velocity, solver_pd) and optional per-column input normalization and output denormalization carried directly on the config. ActuatorNetGRU derives from IdealPDActuator: it predicts the total effort, so the simple symmetric effort-limit clamp applies and the velocity-dependent DC-motor torque-speed parameters are not required. ActuatorNetGRUResidual derives from ImplicitActuator and injects the residual as feed-forward effort while preserving the desired joint positions and velocities for the engine-side PD term; effort limits are enforced by the simulation via effort_limit_sim, as for any implicit actuator. Add unit tests, an API documentation entry, and a changelog fragment.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — PR #6083
feat(actuators): add GRU and GRU-residual network actuator models
Summary
Well-structured PR that adds GRU and GRU-residual actuator models following the established LSTM pattern. The shared _GRUActuatorMixin cleanly factors the common GRU logic, and the test suite (530 lines, 74 tests) is comprehensive. CI passes on pre-commit and builds; installation tests pending.
Findings
✅ Suggestion — Fixed in 63903b9.ActuatorNetGRU.reset() should call super().reset(env_ids) for future-proofing
✅ Warning — Fixed in 63903b9 — proper _resolve_normalization silently floors std but doesn't warn on suspiciously small valueslogger.warning() with field identification added.
🔵 Suggestion — Test coverage: consider adding a multi-step partial-reset interleaving test
The existing test_actuator_net_gru_reset validates that resetting env 0 zeros its hidden state while env 1's state persists. A stronger test would: (1) step both envs for N steps, (2) reset env 0, (3) step both envs again, (4) verify env 0's output matches a fresh-start trajectory while env 1's output matches the continued trajectory. This would catch subtle bugs where the view-based reshape accidentally aliases memory incorrectly across batch dimensions.
🟡 Warning — ActuatorNetGRUResidual.compute() — control_action.joint_positions must not be None
In ActuatorNetGRUResidual.compute(), after _predict_gru_effort (which validates joint_positions is not None), the method proceeds to compute error_pos = control_action.joint_positions - joint_pos. This is safe because _predict_gru_effort raises if positions are None. However, if the residual variant is ever refactored to skip calling _predict_gru_effort first (e.g., a conditional path), this would NPE. A defensive comment or assertion documenting this invariant would help maintainability.
🔵 Suggestion — Minor: docstring for output_normalization uses reversed y * std + mean vs standard (y - mean) / std
The config docstring correctly states the output is denormalized as y * std + mean (inverse of z-score normalization). This is intentional and correct — just noting that users familiar with standard normalization (x - mean) / std might initially misread the output direction. The docstring is already clear; no action needed.
Verdict
✅ Looks good overall. The implementation cleanly follows the established LSTM actuator pattern, the mixin factoring is sensible, error handling (non-finite sanitization, input validation) is solid, and test coverage is thorough. The remaining findings are non-blocking suggestions.
🔄 Update (2025-06-09)
Reviewed incremental changes (1f10825 → 63903b9). Two previously flagged items were addressed:
- ✅
super().reset(env_ids)now called inActuatorNetGRU.reset() - ✅
_resolve_normalizationnow emitslogger.warning()with field name when std is below floor
Additional changes: CONTRIBUTORS.md updated (author added). No new issues introduced. LGTM.
| Operations. | ||
| """ | ||
|
|
||
| def reset(self, env_ids: Sequence[int]): |
There was a problem hiding this comment.
🔵 Suggestion: Consider calling super().reset(env_ids) here for consistency with ActuatorNetGRUResidual.reset() and to future-proof against base class changes.
| def reset(self, env_ids: Sequence[int]): | |
| def reset(self, env_ids: Sequence[int]): | |
| super().reset(env_ids) | |
| self._reset_gru_state(env_ids) |
There was a problem hiding this comment.
Done in 63903b9 — ActuatorNetGRU.reset() now calls super().reset(env_ids) before resetting the hidden state, matching ActuatorNetGRUResidual.
|
|
||
| def _resolve_normalization(self, stats: tuple[float, float] | None) -> tuple[float, float]: | ||
| """Return the ``(mean, std)`` to apply, defaulting to identity and flooring the std.""" | ||
| if stats is None: |
There was a problem hiding this comment.
🟡 Warning: If std is passed as 0.0 (e.g., from a misconfigured normalization config), this silently floors to 1e-8, which can make the normalized inputs explode to ~O(1e8). Consider logging a warning when the input std is below the floor.
There was a problem hiding this comment.
Done in 63903b9 — _resolve_normalization now logs a warning when a configured std is below the floor (and identifies which normalization field), since a near-zero std amplifies the normalized values.
There was a problem hiding this comment.
Thanks, the warning with field identification is exactly what I had in mind. LGTM ✅
- Call super().reset() in ActuatorNetGRU.reset for consistency with the residual variant and to future-proof against base-class changes. - Warn when a configured normalization std is floored, since a near-zero std can amplify the normalized values. - Add Vidur Vij to CONTRIBUTORS.
Greptile SummaryThis PR adds two GRU-based actuator models —
Confidence Score: 5/5Safe to merge — the new GRU actuator path is self-contained, fully tested, and does not modify any existing code paths. All three code paths (input assembly, GRU inference, hidden-state reset) are additive and isolated from the existing LSTM and MLP actuators. The mixin design avoids touching any shared state, the normalization guard correctly raises on negative std and floors near-zero values, and the NaN sanitization prevents non-finite network output from reaching the physics engine. Tests cover the full surface area including edge cases. No files require special attention; the single note in actuator_net.py is a warning-message wording improvement. Important Files Changed
Sequence DiagramsequenceDiagram
participant Sim as Simulation Loop
participant GRU as ActuatorNetGRU / ActuatorNetGRUResidual
participant Mix as _GRUActuatorMixin
participant Net as TorchScript GRU Network
Sim->>GRU: compute(control_action, joint_pos, joint_vel)
GRU->>Mix: _predict_gru_effort(...)
Mix->>Mix: assemble sea_input [position, pos_error, velocity]
Mix->>Mix: apply input normalization (outside inference_mode)
Mix->>Net: forward(sea_input, sea_hidden_state) [inference_mode]
Net-->>Mix: (output, new_hidden_state)
Mix->>Mix: denormalize output and nan_to_num
Mix-->>GRU: effort [num_envs, num_joints]
alt ActuatorNetGRU (explicit)
GRU->>GRU: clip effort via _clip_effort
GRU->>Sim: "joint_efforts set, joint_positions/velocities = None"
else ActuatorNetGRUResidual (implicit-PD)
GRU->>GRU: accumulate residual into joint_efforts
GRU->>GRU: compute approximate total effort for telemetry
GRU->>Sim: joint_efforts augmented, joint_positions/velocities preserved
end
Sim->>GRU: reset(env_ids)
GRU->>Mix: _reset_gru_state(env_ids)
Mix->>Mix: "sea_hidden_state_per_env[:, env_ids] = 0"
Reviews (2): Last reviewed commit: "Use fixed 3-input GRU contract and harde..." | Re-trigger Greptile |
|
|
||
| Args: | ||
| control_action: The joint action instance holding the desired joint positions. | ||
| joint_pos: The current joint positions. Shape is (num_envs, num_joints). | ||
| joint_vel: The current joint velocities. Shape is (num_envs, num_joints). | ||
|
|
||
| Returns: | ||
| The predicted effort [N·m or N, depending on joint type]. Shape is | ||
| (num_envs, num_joints). | ||
|
|
||
| Raises: | ||
| ValueError: If ``control_action.joint_positions`` is None. | ||
| """ | ||
| if control_action.joint_positions is None: | ||
| raise ValueError("GRU actuator input requires control_action.joint_positions to be set.") | ||
| # normalized [position_error, velocity] inputs | ||
| pos_error = (control_action.joint_positions - joint_pos).flatten() | ||
| self.sea_input[:, 0, 0] = (pos_error - self._pos_error_norm[0]) / self._pos_error_norm[1] | ||
| self.sea_input[:, 0, 1] = (joint_vel.flatten() - self._vel_norm[0]) / self._vel_norm[1] | ||
|
|
||
| # run inference, then denormalize and guard against a non-finite output | ||
| with torch.inference_mode(): | ||
| output, self.sea_hidden_state[:] = self.network(self.sea_input, self.sea_hidden_state) |
There was a problem hiding this comment.
Negative
std silently corrected to 1e-8 instead of raising an error
max(float(std), self._GRU_STD_FLOOR) will replace any negative std (e.g., -2.0) with 1e-8 — effectively dividing by nearly zero — while the warning message says "flooring it, which can amplify the normalized values". A user who accidentally swaps the (mean, std) tuple order (passing (std, mean)) or passes a negative value gets input amplification of up to 1 / 1e-8 = 1e8 with a warning that does not clearly signal the sign correction. Only positive-but-small std values belong in the "floor" category; a negative std should raise ValueError to force the misconfiguration to be fixed.
There was a problem hiding this comment.
Fixed in e1dea0f — _resolve_normalization now raises ValueError for a negative std (only 0 ≤ std < 1e-8 is floored, with the warning). Added a regression test (test_actuator_net_gru_negative_std_raises).
There was a problem hiding this comment.
Excellent — raising on negative std while flooring the near-zero case is the right split. Regression test seals it. 👍
| """ | ||
|
|
||
| def reset(self, env_ids: Sequence[int]): | ||
| super().reset(env_ids) | ||
| self._reset_gru_state(env_ids) | ||
|
|
There was a problem hiding this comment.
Denormalization applied to an inference-mode tensor outside the context block
output returned by the network inside torch.inference_mode() is an inference tensor. The denormalization output * self._output_norm[1] + self._output_norm[0] and the subsequent nan_to_num are executed outside the with block, so they run with gradient tracking enabled (if a grad context is active). In most deployment scenarios this is harmless, but it creates unnecessary ops when called from within a torch.no_grad() or outer inference_mode() scope. Consider moving the denormalization and sanitization inside the with torch.inference_mode(): block for consistency and to avoid any surprise interactions with autograd.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed in e1dea0f — the denormalization and nan_to_num now run inside the with torch.inference_mode() block.
There was a problem hiding this comment.
Thanks for the fix! Confirmed — having everything under the inference_mode() context is cleaner and correct. 👍
There was a problem hiding this comment.
Thanks for the quick fix! Verified that the denormalization and nan_to_num are now properly scoped inside the inference_mode() block. 👍
- Use a fixed [position, position_error, velocity] network input (3 inputs) to match the trained actuator-model feature set, instead of the 2-input [position_error, velocity] contract; add position_normalization. - Raise on a negative normalization std and reject bidirectional GRU networks instead of silently mis-handling them. - Run output denormalization and the non-finite guard inside inference_mode. - Strengthen tests: check the explicit actuator against an independent reference forward and cover the negative-std error path.
|
Hi @vidurv-nvidia could you target newton's actuators? We will be deprecating our internal actuators, I think it would make sense to use the shared actuators they built this way IsaacSim and Newton could use them too. Or at least could these actuators modify the newton actuators rather than our own? @jvonmuralt for viz. |
Description
Adds two recurrent (GRU) actuator network models to
isaaclab.actuators, alongside the existing MLP and LSTM models:ActuatorNetGRU— explicit actuator; a GRU predicts the total joint effort from the joint position error and velocity. Derives fromIdealPDActuator, so the predicted effort is clipped to the actuator's effort limit.ActuatorNetGRUResidual— implicit-PD actuator that adds a GRU-predicted residual feed-forward effort. The physics engine applies the PD term using the configured stiffness/damping; the desired positions/velocities are preserved on return.Both load a TorchScript network that exposes a
.grusubmodule and implementsforward(x, hidden) -> (output, hidden)with a 2-dim input[position_error, velocity](mirroringActuatorNetLSTM). Each input and the output support optional(mean, std)normalization via config fields (None= identity). A non-finite network output is sanitized to zero effort before it reaches the engine.These run on the standard (torch) actuator path. The Newton native-actuator path (
use_newton_actuators=True) does not yet provide a GRU controller, so these configs are not authored onto that path; adding a Newton-side GRU controller would be a separate change.Fixes # (no associated issue)
Type of change
Screenshots
N/A — no visual change.
Checklist
pre-commitchecks with./isaaclab.sh --formatsource/<pkg>/changelog.d/for every touched package (do not editCHANGELOG.rstor bumpextension.toml— CI handles that)CONTRIBUTORS.mdor my name already exists there