diff --git a/docs/source/policy_deployment/02_gear_assembly/gear_assembly_policy.rst b/docs/source/policy_deployment/02_gear_assembly/gear_assembly_policy.rst index de8497f181b7..b846a4df9510 100644 --- a/docs/source/policy_deployment/02_gear_assembly/gear_assembly_policy.rst +++ b/docs/source/policy_deployment/02_gear_assembly/gear_assembly_policy.rst @@ -8,9 +8,12 @@ This tutorial walks you through how to train a gear insertion assembly reinforce 1. **Simulation Training in Isaac Lab**: Train the policy in a high-fidelity physics simulation with domain randomization 2. **Real Robot Deployment with Isaac ROS**: Deploy the trained policy on real hardware using Isaac ROS and a custom ROS inference node -This walkthrough covers the key principles and best practices for sim-to-real transfer using Isaac Lab, illustrated with a real-world example: +This walkthrough covers the key principles and best practices for sim-to-real transfer using Isaac Lab. -- the Gear Assembly task for the UR10e robot with the Robotiq 2F-140 gripper or 2F-85 gripper +**Supported Robots:** + +- **Universal Robots UR10e**: 6-DOF industrial robot arm with Robotiq 2F-140 or 2F-85 gripper +- **Flexiv Rizon 4s**: 7-DOF collaborative robot arm with Grav parallel gripper **Task Details:** @@ -29,7 +32,7 @@ The gear assembly policy operates as follows: Sim-to-real transfer: Gear assembly policy trained in Isaac Lab (left) successfully deployed on real UR10e robot (right). -This environment has been successfully deployed on real UR10e robots without an IsaacLab dependency. +This environment has been successfully deployed on real UR10e and Flexiv Rizon 4s robots without an IsaacLab dependency. **Scope of This Tutorial:** @@ -62,35 +65,42 @@ Using Real-Robot-Available Observations Your simulation environment should only use observations that are available on the real robot and not use "privileged" information that wouldn't be available in deployment. -Observation Specification: Isaac-Deploy-GearAssembly-UR10e-2F140-v0 -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Observation Specification +^^^^^^^^^^^^^^^^^^^^^^^^^ The Gear Assembly environment uses both proprioceptive and exteroceptive (vision) observations: .. list-table:: Gear Assembly Environment Observations - :widths: 25 25 25 25 + :widths: 20 15 15 25 25 :header-rows: 1 * - Observation - - Dimension + - UR10e Dim + - Rizon 4s Dim - Real-World Source - Noise in Training * - ``joint_pos`` - - 6 (UR10e arm joints) - - UR10e controller + - 6 + - 7 + - Robot controller - None (proprioceptive) * - ``joint_vel`` - - 6 (UR10e arm joints) - - UR10e controller + - 6 + - 7 + - Robot controller - None (proprioceptive) * - ``gear_shaft_pos`` - - 3 (x, y, z position) + - 3 + - 3 - FoundationPose + RealSense depth - - ±0.005 m (5mm, estimated error from FoundationPose + RealSense depth pipeline) + - ±0.005 m (5mm) * - ``gear_shaft_quat`` - - 4 (quaternion orientation) + - 4 + - 4 - FoundationPose + RealSense depth - - ±0.01 per component (~5° angular error, estimated error from FoundationPose + RealSense depth pipeline) + - ±0.01 per component (~5°) + +**Total observation dimension:** 19 (UR10e) or 21 (Rizon 4s) **Implementation:** @@ -271,21 +281,59 @@ For the UR10e deployment, we use an impedance controller interface. Using a simp - Have more predictable behavior that's easier to replicate - Reduce the controller complexity as a source of sim-real gap -**Example: UR10e Actuator Configuration** - -.. code-block:: python - - # Default UR10e actuator configuration - actuators = { - "arm": ImplicitActuatorCfg( - joint_names_expr=["shoulder_pan_joint", "shoulder_lift_joint", - "elbow_joint", "wrist_1_joint", "wrist_2_joint", "wrist_3_joint"], - effort_limit=87.0, # From UR10e specifications - velocity_limit=2.0, # From UR10e specifications - stiffness=800.0, # Calibrated to match real behavior - damping=40.0, # Calibrated to match real behavior - ), - } +**Actuator Configurations:** + +.. tab-set:: + + .. tab-item:: UR10e + + .. code-block:: python + + # Default UR10e actuator configuration + actuators = { + "arm": ImplicitActuatorCfg( + joint_names_expr=["shoulder_pan_joint", "shoulder_lift_joint", + "elbow_joint", "wrist_1_joint", "wrist_2_joint", "wrist_3_joint"], + effort_limit=87.0, # From UR10e specifications + velocity_limit=2.0, # From UR10e specifications + stiffness=800.0, # Calibrated to match real behavior + damping=40.0, # Calibrated to match real behavior + ), + } + + .. tab-item:: Flexiv Rizon 4s + Grav Gripper + + The Rizon 4s uses ``IdealPDActuatorCfg`` with per-joint-group tuning, plus actuators for the Grav parallel gripper: + + .. code-block:: python + + actuators = { + "shoulder": IdealPDActuatorCfg( + joint_names_expr=["joint[1-2]"], + effort_limit=123.0, velocity_limit=2.094, + stiffness=6000.0, damping=108.4, + ), + "elbow": IdealPDActuatorCfg( + joint_names_expr=["joint[3-4]"], + effort_limit=64.0, velocity_limit=2.443, + stiffness=4200.0, damping=90.7, + ), + "wrist": IdealPDActuatorCfg( + joint_names_expr=["joint[5-7]"], + effort_limit=39.0, velocity_limit=4.887, + stiffness=1500.0, damping=54.2, + ), + "gripper_drive": IdealPDActuatorCfg( + joint_names_expr=["finger_joint"], + effort_limit=2.0, velocity_limit=1.0, + stiffness=2e3, damping=1e1, + ), + "gripper_passive": IdealPDActuatorCfg( + joint_names_expr=[".*_knuckle_joint"], + effort_limit=1.0, velocity_limit=1.0, + stiffness=0.0, damping=0.0, + ), + } **Domain Randomization of Actuator Parameters** @@ -330,6 +378,10 @@ To quantify this behavior, we plotted the step response of the impedance control **Why Joint Friction Matters**: Without modeling joint friction in simulation, the policy learns to expect that commanded joint positions are always reached. On the real robot, stiction prevents small movements and causes steady-state errors. By adding friction during training, the policy learns to account for these effects and commands appropriately larger motions to overcome friction. +.. note:: + + **Flexiv Rizon 4s**: Domain randomization for actuator gains and joint friction is currently disabled for the Rizon 4s gear assembly configuration while the base simulation is being stabilized. Once the base simulation is verified, these can be re-enabled in the ``EventCfg`` class in ``config/rizon_4s/joint_pos_env_cfg.py``. + **Compensating for Stiction with Action Scaling:** To help the policy overcome stiction on the real robot, we also increased the output action scaling. The Isaac ROS documentation notes that a higher action scale (0.0325 vs 0.025) is needed to overcome the higher static friction (stiction) compared to the 2F-85 gripper. This increased scaling ensures the policy commands are large enough to overcome the friction forces observed in the step response analysis. @@ -339,20 +391,41 @@ Action Space Design Your action space should match what the real robot controller can execute. For this task we found that **incremental joint position control** is the most reliable approach. -**Example: Gear Assembly Action Configuration** +**Action Configuration:** -.. code-block:: python +.. tab-set:: - # For contact-rich manipulation, smaller action scale for more precise control - self.joint_action_scale = 0.025 # ±2.5 degrees per step + .. tab-item:: UR10e - self.actions.arm_action = mdp.RelativeJointPositionActionCfg( - asset_name="robot", - joint_names=["shoulder_pan_joint", "shoulder_lift_joint", "elbow_joint", - "wrist_1_joint", "wrist_2_joint", "wrist_3_joint"], - scale=self.joint_action_scale, - use_zero_offset=True, - ) + .. code-block:: python + + self.joint_action_scale = 0.025 # ±1.4 degrees per step + + self.actions.arm_action = mdp.RelativeJointPositionActionCfg( + asset_name="robot", + joint_names=["shoulder_pan_joint", "shoulder_lift_joint", "elbow_joint", + "wrist_1_joint", "wrist_2_joint", "wrist_3_joint"], + scale=self.joint_action_scale, + use_zero_offset=True, + ) + + **Action dimension:** 6 + + .. tab-item:: Flexiv Rizon 4s + + .. code-block:: python + + self.joint_action_scale = 0.025 # ±1.4 degrees per step + + self.actions.arm_action = mdp.RelativeJointPositionActionCfg( + asset_name="robot", + joint_names=["joint1", "joint2", "joint3", "joint4", + "joint5", "joint6", "joint7"], + scale=self.joint_action_scale, + use_zero_offset=True, + ) + + **Action dimension:** 7 The action scale is a critical hyperparameter that should be tuned based on: @@ -427,16 +500,34 @@ Step 1: Visualize the Environment First, launch the training with a small number of environments and visualization enabled to verify that the environment is set up correctly: -.. code-block:: bash +.. tab-set:: - # Launch training with visualization - python scripts/reinforcement_learning/rsl_rl/train.py \ - --task Isaac-Deploy-GearAssembly-UR10e-2F140-v0 \ - --num_envs 4 + .. tab-item:: UR10e (2F-140) -.. note:: + .. code-block:: bash - For the Robotiq 2F-85 gripper, use ``--task Isaac-Deploy-GearAssembly-UR10e-2F85-v0`` instead. + python scripts/reinforcement_learning/rsl_rl/train.py \ + --task Isaac-Deploy-GearAssembly-UR10e-2F140-ROS-Inference-v0 \ + --num_envs 4 \ + --visualizer kit + + .. tab-item:: UR10e (2F-85) + + .. code-block:: bash + + python scripts/reinforcement_learning/rsl_rl/train.py \ + --task Isaac-Deploy-GearAssembly-UR10e-2F85-ROS-Inference-v0 \ + --num_envs 4 \ + --visualizer kit + + .. tab-item:: Flexiv Rizon 4s + Grav + + .. code-block:: bash + + python scripts/reinforcement_learning/rsl_rl/train.py \ + --task Isaac-Deploy-GearAssembly-Rizon4s-Grav-ROS-Inference-v0 \ + --num_envs 4 \ + --visualizer kit This will open the Isaac Sim viewer where you can observe the training process in real-time. @@ -456,21 +547,44 @@ Step 2: Full-Scale Training with Video Recording Now launch the full training run with more parallel environments in headless mode for faster training. We'll also enable video recording to monitor progress: -.. code-block:: bash +.. tab-set:: + + .. tab-item:: UR10e (2F-140) + + .. code-block:: bash + + python scripts/reinforcement_learning/rsl_rl/train.py \ + --task Isaac-Deploy-GearAssembly-UR10e-2F140-ROS-Inference-v0 \ + --headless \ + --num_envs 256 \ + --video --video_length 200 --video_interval 76800 + + .. tab-item:: UR10e (2F-85) - # Full training with video recording - python scripts/reinforcement_learning/rsl_rl/train.py \ - --task Isaac-Deploy-GearAssembly-UR10e-2F140-v0 \ - --headless \ - --num_envs 256 \ - --video --video_length 800 --video_interval 5000 + .. code-block:: bash -This command will: + python scripts/reinforcement_learning/rsl_rl/train.py \ + --task Isaac-Deploy-GearAssembly-UR10e-2F85-ROS-Inference-v0 \ + --headless \ + --num_envs 256 \ + --video --video_length 200 --video_interval 76800 -- Run 256 parallel environments for efficient training -- Run in headless mode (no visualization) for maximum performance -- Record videos every 5000 steps to monitor training progress -- Save videos with 800 frames each + .. tab-item:: Flexiv Rizon 4s + Grav + + .. code-block:: bash + + python scripts/reinforcement_learning/rsl_rl/train.py \ + --task Isaac-Deploy-GearAssembly-Rizon4s-Grav-ROS-Inference-v0 \ + --headless \ + --num_envs 256 \ + --video --video_length 200 --video_interval 76800 + +**Command breakdown:** + +- ``--headless``: Disables visualization for maximum training speed +- ``--num_envs 256``: Runs 256 parallel environments for efficient training +- ``--video_length 200``: Each video captures approximately one full episode (``episode_length_s / (sim.dt * decimation)`` = ``6.66 / (1/1000 * 33)`` ≈ 200 steps) +- ``--video_interval 76800``: Records a video every 76,800 environment steps (~every 150 iterations), producing ~10 videos over full training Training typically takes ~12-24 hours for a robust insertion policy. The videos will be saved in the ``logs`` directory and can be reviewed to assess policy performance during training. @@ -483,11 +597,21 @@ Training typically takes ~12-24 hours for a robust insertion policy. The videos You can monitor training metrics in real-time using TensorBoard. Open a new terminal and run: -.. code-block:: bash +.. tab-set:: + + .. tab-item:: UR10e + + .. code-block:: bash + + ./isaaclab.sh -p -m tensorboard.main --logdir logs/rsl_rl/gear_assembly_ur10e + + .. tab-item:: Flexiv Rizon 4s + + .. code-block:: bash - ./isaaclab.sh -p -m tensorboard.main --logdir + ./isaaclab.sh -p -m tensorboard.main --logdir logs/rsl_rl/gear_assembly_rizon4s_grav -Replace ```` with the path to your training logs (e.g., ``logs/rsl_rl/gear_assembly_ur10e/2025-11-19_19-31-01``). TensorBoard will display plots showing rewards, episode lengths, and other metrics. Verify that the rewards are increasing over iterations to ensure the policy is learning successfully. +Replace the log directory path with your actual training log location if different. TensorBoard will display plots showing rewards, episode lengths, and other metrics. Verify that the rewards are increasing over iterations to ensure the policy is learning successfully. Step 3: Deploy on Real Robot diff --git a/scripts/reinforcement_learning/rsl_rl/play_hardcoded.py b/scripts/reinforcement_learning/rsl_rl/play_hardcoded.py new file mode 100644 index 000000000000..717ce3e4d09f --- /dev/null +++ b/scripts/reinforcement_learning/rsl_rl/play_hardcoded.py @@ -0,0 +1,297 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Script to run hardcoded / deterministic inference with an RSL-RL agent. + +This is a variant of ``play.py`` designed for the hardcoded-inference +environment where all randomization is disabled. The IK grasp-pose solver +is still active so the robot properly grasps the gear; with every random +input zeroed out the result is identical on every reset. + +Features on top of the standard play loop: + +1. **Observation overrides** – hardcode the shaft-position and/or shaft- + quaternion portions of the observation tensor so the policy always sees + fixed values regardless of the sim state. Edit the constants in the + *OBSERVATION OVERRIDES* section below. + +2. **Debug prints** – the initial observation is printed broken down by + component so you can verify the values the policy will see. + +3. **Old checkpoint support** – automatically converts pre-v5 rsl-rl + checkpoints (single ``model_state_dict``) to the v5 split format. + +The gear type, gear base pose, and gear height are configured in the +companion env-config file:: + + .../gear_assembly/config/rizon_4s/hardcoded_inference_env_cfg.py + +Usage:: + + python scripts/reinforcement_learning/rsl_rl/play_hardcoded.py \\ + --num_envs 1 \\ + --checkpoint logs/rsl_rl/gear_assembly/2026-03-13_16-28-11/model_500.pt \\ + --visualizer kit +""" + +import argparse +import contextlib +import importlib.metadata as metadata +import os +import sys +import time + +import gymnasium as gym +import torch +from packaging import version +from rsl_rl.runners import DistillationRunner, OnPolicyRunner + +from isaaclab.envs import DirectMARLEnvCfg, DirectRLEnvCfg, ManagerBasedRLEnvCfg +from isaaclab.utils.assets import retrieve_file_path + +from isaaclab_rl.rsl_rl import ( + RslRlBaseRunnerCfg, + RslRlVecEnvWrapper, + handle_deprecated_rsl_rl_cfg, +) + +import isaaclab_tasks # noqa: F401 +from isaaclab_tasks.utils import add_launcher_args, get_checkpoint_path, launch_simulation +from isaaclab_tasks.utils.hydra import hydra_task_config + +# local imports +import cli_args # isort: skip + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + + +# ╔══════════════════════════════════════════════════════════════════════════════╗ +# ║ OBSERVATION OVERRIDES ║ +# ║ Set a value to inject a fixed observation every step, or leave as None ║ +# ║ to use the value computed from the simulation. ║ +# ║ ║ +# ║ Obs layout: [joint_pos(7) | joint_vel(7) | shaft_pos(3) | shaft_quat(4)] ║ +# ╚══════════════════════════════════════════════════════════════════════════════╝ + +OVERRIDE_SHAFT_POS = None # e.g. [0.481, -0.073, -0.005] +OVERRIDE_SHAFT_QUAT = None # e.g. [0.0, 0.0, 0.70711, -0.70711] + +_SHAFT_POS_SLICE = slice(14, 17) +_SHAFT_QUAT_SLICE = slice(17, 21) + + +# -- argparse ---------------------------------------------------------------- +parser = argparse.ArgumentParser(description="Run hardcoded deterministic inference with RSL-RL.") +parser.add_argument( + "--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations." +) +parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to simulate.") +parser.add_argument( + "--task", + type=str, + default="Isaac-Deploy-GearAssembly-Rizon4s-Grav-Hardcoded-Inference-v0", + help="Name of the task.", +) +parser.add_argument( + "--agent", type=str, default="rsl_rl_cfg_entry_point", help="Name of the RL agent configuration entry point." +) +parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment") +parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.") +cli_args.add_rsl_rl_args(parser) +add_launcher_args(parser) +args_cli, remaining_args = parser.parse_known_args() + +# clear out sys.argv for Hydra +sys.argv = [sys.argv[0]] + remaining_args + +installed_version = metadata.version("rsl-rl-lib") + + +# -- old checkpoint conversion ----------------------------------------------- +_OLD_TO_NEW_KEY_MAP = { + # Actor MLP + "actor.": "mlp.", + # Critic MLP + "critic.": "mlp.", + # Actor RNN + "memory_a.rnn.": "rnn.rnn.", + # Critic RNN + "memory_c.rnn.": "rnn.rnn.", + # Actor obs normalizer + "actor_obs_normalizer.": "obs_normalizer.", + # Critic obs normalizer + "critic_obs_normalizer.": "obs_normalizer.", +} + +_ACTOR_PREFIXES = ("actor.", "memory_a.", "actor_obs_normalizer.") +_CRITIC_PREFIXES = ("critic.", "memory_c.", "critic_obs_normalizer.") + + +def _convert_old_checkpoint(loaded_dict: dict) -> dict: + """Convert a pre-v5 rsl-rl checkpoint to the v5 format. + + Old format: single ``model_state_dict`` with prefixes like ``actor.*``, + ``memory_a.*``, ``actor_obs_normalizer.*``, etc. + + New format: separate ``actor_state_dict`` and ``critic_state_dict`` with + prefixes ``mlp.*``, ``rnn.rnn.*``, ``obs_normalizer.*``. + """ + old_sd = loaded_dict["model_state_dict"] + actor_sd: dict[str, torch.Tensor] = {} + critic_sd: dict[str, torch.Tensor] = {} + + for old_key, tensor in old_sd.items(): + is_actor = old_key.startswith(_ACTOR_PREFIXES) + is_critic = old_key.startswith(_CRITIC_PREFIXES) + + new_key = old_key + for old_prefix, new_prefix in _OLD_TO_NEW_KEY_MAP.items(): + if old_key.startswith(old_prefix): + new_key = new_prefix + old_key[len(old_prefix):] + break + + if is_actor: + actor_sd[new_key] = tensor + elif is_critic: + critic_sd[new_key] = tensor + + print(f"[INFO] Converted old checkpoint: {len(actor_sd)} actor keys, {len(critic_sd)} critic keys") + + return { + "actor_state_dict": actor_sd, + "critic_state_dict": critic_sd, + "optimizer_state_dict": loaded_dict.get("optimizer_state_dict", {}), + "iter": loaded_dict.get("iter", 0), + "infos": loaded_dict.get("infos"), + } + + +def _load_checkpoint(runner, resume_path: str): + """Load a checkpoint, automatically converting old (pre-v5) formats.""" + loaded_dict = torch.load(resume_path, weights_only=False) + if "model_state_dict" in loaded_dict and "actor_state_dict" not in loaded_dict: + print("[INFO] Detected old (pre-v5) rsl-rl checkpoint format — converting …") + loaded_dict = _convert_old_checkpoint(loaded_dict) + load_cfg = {"actor": True, "critic": True, "optimizer": False, "iteration": True} + runner.alg.load(loaded_dict, load_cfg, strict=True) + if load_cfg.get("iteration") and "iter" in loaded_dict: + runner.current_learning_iteration = loaded_dict["iter"] + + +def _get_policy_tensor(obs): + """Extract the flat policy tensor from obs (TensorDict or plain Tensor).""" + if isinstance(obs, torch.Tensor): + return obs + # TensorDict — the wrapper stores the concatenated obs under the "policy" key + return obs["policy"] + + +def _apply_obs_overrides(obs): + """Optionally replace shaft_pos / shaft_quat in the observation.""" + t = _get_policy_tensor(obs) + if OVERRIDE_SHAFT_POS is not None: + t[:, _SHAFT_POS_SLICE] = torch.tensor(OVERRIDE_SHAFT_POS, device=t.device, dtype=t.dtype) + if OVERRIDE_SHAFT_QUAT is not None: + t[:, _SHAFT_QUAT_SLICE] = torch.tensor(OVERRIDE_SHAFT_QUAT, device=t.device, dtype=t.dtype) + return obs + + +def _print_obs(obs, label: str = "Observation"): + """Print observation components for the first environment.""" + t = _get_policy_tensor(obs) + o = t[0] + print(f"\n[{label}]") + print(f" joint_pos : {o[0:7].tolist()}") + print(f" joint_vel : {o[7:14].tolist()}") + print(f" shaft_pos : {o[14:17].tolist()}") + print(f" shaft_quat : {o[17:21].tolist()}") + + +@hydra_task_config(args_cli.task, args_cli.agent) +def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agent_cfg: RslRlBaseRunnerCfg): + """Run hardcoded inference with RSL-RL agent.""" + with launch_simulation(env_cfg, args_cli): + # -- configure ------------------------------------------------------- + agent_cfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli) + env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs + agent_cfg = handle_deprecated_rsl_rl_cfg(agent_cfg, installed_version) + + env_cfg.seed = agent_cfg.seed + env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device + + # -- resolve checkpoint ---------------------------------------------- + log_root_path = os.path.join("logs", "rsl_rl", agent_cfg.experiment_name) + log_root_path = os.path.abspath(log_root_path) + print(f"[INFO] Loading experiment from directory: {log_root_path}") + + if args_cli.checkpoint: + resume_path = retrieve_file_path(args_cli.checkpoint) + else: + resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint) + + log_dir = os.path.dirname(resume_path) + env_cfg.log_dir = log_dir + + # -- create environment ---------------------------------------------- + env = gym.make(args_cli.task, cfg=env_cfg) + if isinstance(env.unwrapped.cfg, DirectMARLEnvCfg): + from isaaclab.envs import multi_agent_to_single_agent + + env = multi_agent_to_single_agent(env) + env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions) + + # -- load policy ----------------------------------------------------- + print(f"[INFO]: Loading model checkpoint from: {resume_path}") + if agent_cfg.class_name == "OnPolicyRunner": + runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=None, device=agent_cfg.device) + elif agent_cfg.class_name == "DistillationRunner": + runner = DistillationRunner(env, agent_cfg.to_dict(), log_dir=None, device=agent_cfg.device) + else: + raise ValueError(f"Unsupported runner class: {agent_cfg.class_name}") + _load_checkpoint(runner, resume_path) + + policy = runner.get_inference_policy(device=env.unwrapped.device) + + if version.parse(installed_version) >= version.parse("5.0.0"): + policy_nn = runner.alg.actor + elif version.parse(installed_version) >= version.parse("2.3.0"): + policy_nn = runner.alg.policy + else: + policy_nn = runner.alg.actor_critic + + dt = env.unwrapped.step_dt + + # -- initial observation --------------------------------------------- + obs = env.get_observations() + obs = _apply_obs_overrides(obs) + _print_obs(obs, label="Initial observation (after overrides)") + + # -- inference loop -------------------------------------------------- + try: + while True: + start_time = time.time() + with torch.inference_mode(): + actions = policy(obs) + obs, _, dones, _ = env.step(actions) + obs = _apply_obs_overrides(obs) + + if version.parse(installed_version) >= version.parse("4.0.0"): + policy.reset(dones) + else: + policy_nn.reset(dones) + + sleep_time = dt - (time.time() - start_time) + if args_cli.real_time and sleep_time > 0: + time.sleep(sleep_time) + + env.close() + except KeyboardInterrupt: + pass + + +if __name__ == "__main__": + main() diff --git a/source/isaaclab_assets/isaaclab_assets/__init__.pyi b/source/isaaclab_assets/isaaclab_assets/__init__.pyi index cf607283682a..dc69e0b8c71c 100644 --- a/source/isaaclab_assets/isaaclab_assets/__init__.pyi +++ b/source/isaaclab_assets/isaaclab_assets/__init__.pyi @@ -55,6 +55,7 @@ __all__ = [ "UR10e_ROBOTIQ_GRIPPER_CFG", "UR10e_ROBOTIQ_2F_85_CFG", "FLEXIV_RIZON4S_CFG", + "FLEXIV_RIZON4S_GRAV_GRIPPER_CFG", "GELSIGHT_R15_CFG", "GELSIGHT_MINI_CFG", "VELODYNE_VLP_16_RAYCASTER_CFG", @@ -117,5 +118,6 @@ from .robots import ( UR10e_ROBOTIQ_GRIPPER_CFG, UR10e_ROBOTIQ_2F_85_CFG, FLEXIV_RIZON4S_CFG, + FLEXIV_RIZON4S_GRAV_GRIPPER_CFG, ) from .sensors import GELSIGHT_R15_CFG, GELSIGHT_MINI_CFG, VELODYNE_VLP_16_RAYCASTER_CFG diff --git a/source/isaaclab_assets/isaaclab_assets/robots/__init__.pyi b/source/isaaclab_assets/isaaclab_assets/robots/__init__.pyi index 9875e9244fa3..1a91afa213a7 100644 --- a/source/isaaclab_assets/isaaclab_assets/robots/__init__.pyi +++ b/source/isaaclab_assets/isaaclab_assets/robots/__init__.pyi @@ -55,6 +55,7 @@ __all__ = [ "UR10e_ROBOTIQ_GRIPPER_CFG", "UR10e_ROBOTIQ_2F_85_CFG", "FLEXIV_RIZON4S_CFG", + "FLEXIV_RIZON4S_GRAV_GRIPPER_CFG", ] from .agibot import AGIBOT_A2D_CFG @@ -105,4 +106,4 @@ from .universal_robots import ( UR10e_ROBOTIQ_GRIPPER_CFG, UR10e_ROBOTIQ_2F_85_CFG, ) -from .flexiv import FLEXIV_RIZON4S_CFG +from .flexiv import FLEXIV_RIZON4S_CFG, FLEXIV_RIZON4S_GRAV_GRIPPER_CFG diff --git a/source/isaaclab_assets/isaaclab_assets/robots/flexiv.py b/source/isaaclab_assets/isaaclab_assets/robots/flexiv.py index 2e4b14347da8..18adf4312022 100644 --- a/source/isaaclab_assets/isaaclab_assets/robots/flexiv.py +++ b/source/isaaclab_assets/isaaclab_assets/robots/flexiv.py @@ -9,6 +9,7 @@ The following configurations are available: * :obj:`FLEXIV_RIZON4S_CFG`: The Flexiv Rizon 4s arm without a gripper. +* :obj:`FLEXIV_RIZON4S_GRAV_GRIPPER_CFG`: The Flexiv Rizon 4s arm with Grav gripper. Reference: https://www.flexiv.com/product/rizon """ @@ -79,5 +80,92 @@ ), }, ) - """Configuration of Flexiv Rizon 4s arm using implicit actuator models.""" + + +FLEXIV_RIZON4S_GRAV_GRIPPER_CFG = ArticulationCfg( + spawn=sim_utils.UsdFileCfg( + usd_path=f"{ISAACLAB_NUCLEUS_DIR}/Robots/Flexiv/Rizon4s/rizon4s_with_grav.usd", + rigid_props=sim_utils.RigidBodyPropertiesCfg( + disable_gravity=False, + max_depenetration_velocity=5.0, + ), + articulation_props=sim_utils.ArticulationRootPropertiesCfg( + enabled_self_collisions=False, + solver_position_iteration_count=16, + solver_velocity_iteration_count=1, + ), + activate_contact_sensors=False, + ), + init_state=ArticulationCfg.InitialStateCfg( + joint_pos={ + "joint1": 0.0, + "joint2": -0.698, + "joint3": 0.0, + "joint4": 1.571, + "joint5": 0.0, + "joint6": 0.698, + "joint7": 0.0, + "finger_joint": 0.0, + "left_outer_finger_joint": 0.0, + "right_outer_finger_joint": 0.0, + }, + pos=(0.0, 0.0, 0.0), + rot=(0.0, 0.0, 0.0, 1.0), + ), + actuators={ + "shoulder": ImplicitActuatorCfg( + joint_names_expr=["joint[1-2]"], + effort_limit_sim=123.0, + velocity_limit_sim=2.094, + stiffness=1320.0, + damping=72.0, + friction=0.0, + armature=0.0, + ), + "elbow": ImplicitActuatorCfg( + joint_names_expr=["joint[3-4]"], + effort_limit_sim=64.0, + velocity_limit_sim=2.443, + stiffness=600.0, + damping=35.0, + friction=0.0, + armature=0.0, + ), + "wrist": ImplicitActuatorCfg( + joint_names_expr=["joint[5-7]"], + effort_limit_sim=39.0, + velocity_limit_sim=4.887, + stiffness=216.0, + damping=29.0, + friction=0.0, + armature=0.0, + ), + "gripper_drive": ImplicitActuatorCfg( + joint_names_expr=["finger_joint"], + effort_limit_sim=200.0, + velocity_limit_sim=0.6, + stiffness=2e3, + damping=1e1, + friction=0.0, + armature=0.0, + ), + "gripper_passive": ImplicitActuatorCfg( + joint_names_expr=[".*_knuckle_joint"], + effort_limit_sim=1.0, + velocity_limit_sim=1.0, + stiffness=0.0, + damping=0.0, + friction=0.0, + armature=0.0, + ), + }, +) +"""Configuration of Flexiv Rizon 4s arm with Grav gripper using implicit actuator models. + +The Grav gripper is a parallel gripper with the following joint configuration: +- finger_joint: Main actuation joint (opened: 45 deg, closed: -8.88 deg) +- *_knuckle_joint: Passive/mimic joints (not directly actuated) + +End effector body: right_finger_tip +""" diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/__init__.py new file mode 100644 index 000000000000..3787151df261 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/__init__.py @@ -0,0 +1,55 @@ +# Copyright (c) 2025-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +import gymnasium as gym + +from . import agents + +## +# Register Gym environments. +## + + +# Flexiv Rizon 4s +gym.register( + id="Isaac-Deploy-GearAssembly-Rizon4s-Grav-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.joint_pos_env_cfg:Rizon4sGearAssemblyEnvCfg", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:Rizon4sGearAssemblyRNNPPORunnerCfg", + }, +) + +gym.register( + id="Isaac-Deploy-GearAssembly-Rizon4s-Grav-Play-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.joint_pos_env_cfg:Rizon4sGearAssemblyEnvCfg_PLAY", + }, +) + +# Flexiv Rizon 4s - ROS Inference +gym.register( + id="Isaac-Deploy-GearAssembly-Rizon4s-Grav-ROS-Inference-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.ros_inference_env_cfg:Rizon4sGearAssemblyROSInferenceEnvCfg", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:Rizon4sGearAssemblyRNNPPORunnerCfg", + }, +) + +# Flexiv Rizon 4s - Hardcoded Inference (no randomization, deterministic) +gym.register( + id="Isaac-Deploy-GearAssembly-Rizon4s-Grav-Hardcoded-Inference-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.hardcoded_inference_env_cfg:Rizon4sGearAssemblyHardcodedInferenceEnvCfg", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:Rizon4sGearAssemblyRNNPPORunnerCfg", + }, +) diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/agents/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/agents/__init__.py new file mode 100644 index 000000000000..cf59b16a1e2e --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/agents/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) 2025-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/agents/rsl_rl_ppo_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/agents/rsl_rl_ppo_cfg.py new file mode 100644 index 000000000000..06f64e731afc --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/agents/rsl_rl_ppo_cfg.py @@ -0,0 +1,49 @@ +# Copyright (c) 2025-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +from isaaclab.utils import configclass + +from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticRecurrentCfg, RslRlPpoAlgorithmCfg + + +@configclass +class Rizon4sGearAssemblyRNNPPORunnerCfg(RslRlOnPolicyRunnerCfg): + num_steps_per_env = 512 + max_iterations = 1500 + save_interval = 50 + experiment_name = "gear_assembly_rizon4s" + clip_actions = 1.0 + resume = False + obs_groups = { + "policy": ["policy"], + "critic": ["critic"], + } + policy = RslRlPpoActorCriticRecurrentCfg( + state_dependent_std=True, + init_noise_std=1.0, + actor_obs_normalization=True, + critic_obs_normalization=True, + actor_hidden_dims=[256, 128, 64], + critic_hidden_dims=[256, 128, 64], + noise_std_type="log", + activation="elu", + rnn_type="lstm", + rnn_hidden_dim=256, + rnn_num_layers=2, + ) + algorithm = RslRlPpoAlgorithmCfg( + value_loss_coef=1.0, + use_clipped_value_loss=True, + clip_param=0.2, + entropy_coef=0.0, + num_learning_epochs=8, + num_mini_batches=16, + learning_rate=5.0e-4, + schedule="adaptive", + gamma=0.99, + lam=0.95, + desired_kl=0.008, + max_grad_norm=1.0, + ) diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/hardcoded_inference_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/hardcoded_inference_env_cfg.py new file mode 100644 index 000000000000..759d9f34d2cc --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/hardcoded_inference_env_cfg.py @@ -0,0 +1,95 @@ +# Copyright (c) 2025-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Hardcoded deterministic inference configuration for Flexiv Rizon 4s with Grav gripper. + +Edit the constants below to match your real-world setup. All randomization is +disabled and observation noise is turned off so the simulation is fully +deterministic across resets. + +The IK grasp-pose solver (``set_robot_to_grasp_pose``) is kept so the robot +properly grasps the gear. With all randomization zeroed out the IK solution +is identical on every reset. + +To additionally override the robot's joint positions *after* IK, set +``OVERRIDE_ROBOT_JOINT_POS`` in ``play_hardcoded.py``. +""" + +from isaaclab.assets import RigidObjectCfg +from isaaclab.utils import configclass + +from .ros_inference_env_cfg import Rizon4sGearAssemblyROSInferenceEnvCfg + +# ╔══════════════════════════════════════════════════════════════════════════════╗ +# ║ HARDCODED VALUES — Edit these to match your desired setup ║ +# ╚══════════════════════════════════════════════════════════════════════════════╝ + +# Which gear the robot is grasping: "gear_small", "gear_medium", or "gear_large" +GEAR_TYPE = "gear_large" + +# Gear base position (x, y, z) in meters, relative to env origin +GEAR_BASE_POS = (0.481, -0.073, -0.005) + +# Gear base orientation as quaternion (x, y, z, w) +GEAR_BASE_ROT = (0.0, 0.0, 0.70711, -0.70711) + +# Height of the grasped gear above the gear base (world-Z offset in meters). +# Training midpoint is 0.0675 from range [0.0575, 0.0775]. +GEAR_Z_OFFSET = 0.0675 + +# ╔══════════════════════════════════════════════════════════════════════════════╗ +# ║ Environment configuration ║ +# ╚══════════════════════════════════════════════════════════════════════════════╝ + + +@configclass +class Rizon4sGearAssemblyHardcodedInferenceEnvCfg(Rizon4sGearAssemblyROSInferenceEnvCfg): + """Deterministic inference config — hardcoded poses, no randomization, no obs noise. + + Inherits the full ROS-inference configuration and then: + 1. Fixes the gear type (no random selection). + 2. Zeros out all pose/orientation randomization ranges. + 3. Sets gear-base and individual-gear initial states from the constants above. + 4. Disables observation noise (``enable_corruption = False``). + + The IK grasp-pose solver is kept from the parent so the robot properly + reaches and grasps the gear each reset. Since all inputs to IK are + deterministic the resulting joint positions are the same every time. + """ + + def __post_init__(self): + super().__post_init__() + + # ── 1. Fix gear type ────────────────────────────────────────── + self.events.randomize_gear_type.params["gear_types"] = [GEAR_TYPE] + + # ── 2. Zero out all pose randomization ──────────────────────── + self.events.randomize_gears_and_base_pose.params["pose_range"] = { + "x": [0.0, 0.0], + "y": [0.0, 0.0], + "z": [0.0, 0.0], + "roll": [0.0, 0.0], + "pitch": [0.0, 0.0], + "yaw": [0.0, 0.0], + } + self.events.randomize_gears_and_base_pose.params["gear_pos_range"] = { + "x": [0.0, 0.0], + "y": [0.0, 0.0], + "z": [GEAR_Z_OFFSET, GEAR_Z_OFFSET], + } + + # ── 3. Hardcoded gear base and gear positions ───────────────── + self.scene.factory_gear_base.init_state = RigidObjectCfg.InitialStateCfg( + pos=GEAR_BASE_POS, + rot=GEAR_BASE_ROT, + ) + for attr in ("factory_gear_small", "factory_gear_medium", "factory_gear_large"): + getattr(self.scene, attr).init_state = RigidObjectCfg.InitialStateCfg( + pos=GEAR_BASE_POS, + rot=GEAR_BASE_ROT, + ) + + # ── 4. Disable observation noise ────────────────────────────── + self.observations.policy.enable_corruption = False diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/joint_pos_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/joint_pos_env_cfg.py new file mode 100644 index 000000000000..c93a78ee6da9 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/joint_pos_env_cfg.py @@ -0,0 +1,440 @@ +# Copyright (c) 2025-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +import math + +import torch + +import isaaclab.sim as sim_utils +from isaaclab.actuators import ImplicitActuatorCfg +from isaaclab.assets import ArticulationCfg, RigidObjectCfg +from isaaclab.managers import EventTermCfg as EventTerm +from isaaclab.managers import SceneEntityCfg +from isaaclab.utils import configclass + +import isaaclab_tasks.manager_based.manipulation.deploy.mdp as mdp +import isaaclab_tasks.manager_based.manipulation.deploy.mdp.events as gear_assembly_events +from isaaclab_tasks.manager_based.manipulation.deploy.gear_assembly.gear_assembly_env_cfg import GearAssemblyEnvCfg + +## +# Pre-defined configs +## +from isaaclab_assets import FLEXIV_RIZON4S_GRAV_GRIPPER_CFG # isort: skip + + +## +# Gripper-specific helper functions +## + + +def set_finger_joint_pos_grav( + joint_pos: torch.Tensor, + reset_ind_joint_pos: list[int], + finger_joints: list[int], + finger_joint_position: float, +): + """Set finger joint positions for Grav gripper. + + Args: + joint_pos: Joint positions tensor + reset_ind_joint_pos: Row indices into the sliced joint_pos tensor + finger_joints: List of all gripper joint indices (6 joints total) + finger_joint_position: Target position for main finger joint (in radians) + + Note: + Grav gripper joint structure (indices from finger_joints list): + [0] finger_joint - main controllable joint + [1] left_inner_knuckle_joint - mimic with -1 gearing + [2] right_inner_knuckle_joint - mimic with -1 gearing + [3] right_outer_knuckle_joint - mimic with -1 gearing + [4] left_outer_finger_joint - mimic with +1 gearing + [5] right_outer_finger_joint - mimic with +1 gearing + """ + for idx in reset_ind_joint_pos: + if len(finger_joints) < 6: + raise ValueError(f"Grav gripper requires at least 6 finger joints, got {len(finger_joints)}") + + # Main controllable joint + joint_pos[idx, finger_joints[0]] = finger_joint_position + + # Mimic joints with -1 gearing + joint_pos[idx, finger_joints[1]] = finger_joint_position # left_inner_knuckle_joint + joint_pos[idx, finger_joints[2]] = finger_joint_position # right_inner_knuckle_joint + joint_pos[idx, finger_joints[3]] = finger_joint_position # right_outer_knuckle_joint + + # Mimic joints with +1 gearing + joint_pos[idx, finger_joints[4]] = -finger_joint_position # left_outer_finger_joint + joint_pos[idx, finger_joints[5]] = -finger_joint_position # right_outer_finger_joint + + +## +# Environment configuration +## + + +@configclass +class EventCfg: + """Configuration for events.""" + + # NOTE: Domain randomization for actuator gains and joint friction disabled for stability. + # Re-enable once the base simulation is stable and working. + # robot_joint_stiffness_and_damping = EventTerm( + # func=mdp.randomize_actuator_gains, + # mode="reset", + # params={ + # "asset_cfg": SceneEntityCfg( + # "robot", joint_names=["joint[1-2]", "joint[3-4]", "joint[5-7]"] + # ), + # "stiffness_distribution_params": (0.75, 1.5), + # "damping_distribution_params": (0.3, 3.0), + # "operation": "scale", + # "distribution": "log_uniform", + # }, + # ) + + # joint_friction = EventTerm( + # func=mdp.randomize_joint_parameters, + # mode="reset", + # params={ + # "asset_cfg": SceneEntityCfg("robot", joint_names=["joint[1-2]", "joint[3-4]", "joint[5-7]"]), + # "friction_distribution_params": (0.3, 0.7), + # "operation": "add", + # "distribution": "uniform", + # }, + # ) + + small_gear_physics_material = EventTerm( + func=mdp.randomize_rigid_body_material, + mode="startup", + params={ + "asset_cfg": SceneEntityCfg("factory_gear_small", body_names=".*"), + "static_friction_range": (0.75, 0.75), + "dynamic_friction_range": (0.75, 0.75), + "restitution_range": (0.0, 0.0), + "num_buckets": 16, + }, + ) + + medium_gear_physics_material = EventTerm( + func=mdp.randomize_rigid_body_material, + mode="startup", + params={ + "asset_cfg": SceneEntityCfg("factory_gear_medium", body_names=".*"), + "static_friction_range": (0.75, 0.75), + "dynamic_friction_range": (0.75, 0.75), + "restitution_range": (0.0, 0.0), + "num_buckets": 16, + }, + ) + + large_gear_physics_material = EventTerm( + func=mdp.randomize_rigid_body_material, + mode="startup", + params={ + "asset_cfg": SceneEntityCfg("factory_gear_large", body_names=".*"), + "static_friction_range": (0.75, 0.75), + "dynamic_friction_range": (0.75, 0.75), + "restitution_range": (0.0, 0.0), + "num_buckets": 16, + }, + ) + + gear_base_physics_material = EventTerm( + func=mdp.randomize_rigid_body_material, + mode="startup", + params={ + "asset_cfg": SceneEntityCfg("factory_gear_base", body_names=".*"), + "static_friction_range": (0.0, 0.0), + "dynamic_friction_range": (0.0, 0.0), + "restitution_range": (0.0, 0.0), + "num_buckets": 16, + }, + ) + + robot_physics_material = EventTerm( + func=mdp.randomize_rigid_body_material, + mode="startup", + params={ + "asset_cfg": SceneEntityCfg("robot", body_names=".*finger.*"), + "static_friction_range": (3.0, 3.0), + "dynamic_friction_range": (3.0, 3.0), + "restitution_range": (0.0, 0.0), + "num_buckets": 16, + }, + ) + + randomize_gear_type = EventTerm( + func=gear_assembly_events.randomize_gear_type, + mode="reset", + params={"gear_types": ["gear_small", "gear_medium", "gear_large"]}, + # params={"gear_types": ["gear_small", "gear_medium"]}, + ) + + reset_all = EventTerm(func=mdp.reset_scene_to_default, mode="reset") + + randomize_gears_and_base_pose = EventTerm( + func=gear_assembly_events.randomize_gears_and_base_pose, + mode="reset", + params={ + "pose_range": { + "x": [-0.1, 0.1], + "y": [-0.25, 0.25], + "z": [-0.1, 0.1], + "roll": [-math.pi / 90, math.pi / 90], # 2 degree + "pitch": [-math.pi / 90, math.pi / 90], # 2 degree + "yaw": [-math.pi / 6, math.pi / 6], # 30 degree + }, + "gear_pos_range": { + "x": [-0.02, 0.02], + "y": [-0.02, 0.02], + "z": [0.0575, 0.0775], # 0.045 + 0.0225 + # "x": [-0.0, 0.0], + # "y": [-0.0, 0.0], + # "z": [0.0675, 0.0675], # 0.045 + 0.0225 + }, + "velocity_range": {}, + }, + ) + + set_robot_to_grasp_pose = EventTerm( + func=gear_assembly_events.set_robot_to_grasp_pose, + mode="reset", + params={ + "robot_asset_cfg": SceneEntityCfg("robot"), + # "pos_randomization_range": {"x": [-0.0, 0.0], "y": [-0.005, 0.005], "z": [-0.003, 0.003]}, + "pos_randomization_range": {"x": [-0.0, 0.0], "y": [-0.00, 0.00], "z": [-0.00, 0.00]}, + }, + ) + + +@configclass +class Rizon4sGearAssemblyEnvCfg(GearAssemblyEnvCfg): + """Configuration for Flexiv Rizon 4s with Grav Gripper Gear Assembly Environment. + + The Flexiv Rizon 4s is a 7-DOF collaborative robot arm equipped with the + Flexiv Grav parallel gripper for gear manipulation tasks. + """ + + def __post_init__(self): + # post init of parent + super().__post_init__() + + # Robot-specific parameters for Flexiv Rizon 4s with Grav gripper + self.end_effector_body_name = "link7" # End effector body name for IK + self.num_arm_joints = 7 # Number of arm joints (Rizon 4s has 7 DOF) + # Rotation offset for grasp pose (quaternion [x, y, z, w]) + # Computed from IK convergence for downward-facing end effector + self.grasp_rot_offset = [ + -0.707, + 0.707, + 0.0, + 0.0, + ] + self.gripper_joint_setter_func = set_finger_joint_pos_grav # Grav gripper joint setter function + + # Gear orientation termination thresholds (in degrees) + self.gear_orientation_roll_threshold_deg = 15.0 # Maximum allowed roll deviation + self.gear_orientation_pitch_threshold_deg = 15.0 # Maximum allowed pitch deviation + self.gear_orientation_yaw_threshold_deg = 180.0 # Maximum allowed yaw deviation + + # Common observation configuration for Rizon 4s joints (arm only, not gripper) + self.observations.policy.joint_pos.params["asset_cfg"].joint_names = [ + "joint1", + "joint2", + "joint3", + "joint4", + "joint5", + "joint6", + "joint7", + ] + self.observations.policy.joint_vel.params["asset_cfg"].joint_names = [ + "joint1", + "joint2", + "joint3", + "joint4", + "joint5", + "joint6", + "joint7", + ] + + # override events + self.events = EventCfg() + + # Update termination thresholds from config + self.terminations.gear_orientation_exceeded.params["roll_threshold_deg"] = ( + self.gear_orientation_roll_threshold_deg + ) + self.terminations.gear_orientation_exceeded.params["pitch_threshold_deg"] = ( + self.gear_orientation_pitch_threshold_deg + ) + self.terminations.gear_orientation_exceeded.params["yaw_threshold_deg"] = ( + self.gear_orientation_yaw_threshold_deg + ) + + # Action configuration for Rizon 4s arm + # Using smaller action scale for stability + self.joint_action_scale = 0.025 + self.actions.arm_action = mdp.RelativeJointPositionActionCfg( + asset_name="robot", + joint_names=[ + "joint1", + "joint2", + "joint3", + "joint4", + "joint5", + "joint6", + "joint7", + ], + scale=self.joint_action_scale, + use_zero_offset=True, + ) + + # Switch robot to Flexiv Rizon 4s with Grav gripper + self.scene.robot = FLEXIV_RIZON4S_GRAV_GRIPPER_CFG.replace( + prim_path="{ENV_REGEX_NS}/Robot", + spawn=FLEXIV_RIZON4S_GRAV_GRIPPER_CFG.spawn.replace( + rigid_props=sim_utils.RigidBodyPropertiesCfg( + disable_gravity=True, + max_depenetration_velocity=5.0, + linear_damping=0.0, + angular_damping=0.0, + max_linear_velocity=1000.0, + max_angular_velocity=3666.0, + enable_gyroscopic_forces=True, + solver_position_iteration_count=4, + solver_velocity_iteration_count=1, + max_contact_impulse=1e32, + ), + articulation_props=sim_utils.ArticulationRootPropertiesCfg( + enabled_self_collisions=False, solver_position_iteration_count=4, solver_velocity_iteration_count=1 + ), + collision_props=sim_utils.CollisionPropertiesCfg(contact_offset=0.005, rest_offset=0.0), + ), + # Joint positions based on IK from center of distribution for randomized gear positions + init_state=ArticulationCfg.InitialStateCfg( + joint_pos={ + "joint1": 0.0, + "joint2": -0.698, + "joint3": 0.0, + "joint4": 1.571, + "joint5": 0.0, + "joint6": 0.698, + "joint7": 0.0, + }, + pos=(0.0, 0.0, 0.0), + rot=(0.0, 0.0, 0.0, 1.0), + ), + ) + + # Grav gripper actuator configuration for gear manipulation + self.scene.robot.actuators["gripper_drive"] = ImplicitActuatorCfg( + joint_names_expr=["finger_joint"], + effort_limit_sim=2.0, + velocity_limit_sim=1.0, + stiffness=2e3, + damping=1e1, + friction=0.0, + armature=0.0, + ) + + # Passive/mimic joints in the gripper - set to zero stiffness/damping + self.scene.robot.actuators["gripper_passive"] = ImplicitActuatorCfg( + joint_names_expr=[".*_knuckle_joint"], + effort_limit_sim=1.0, + velocity_limit_sim=1.0, + stiffness=0.0, + damping=0.0, + friction=0.0, + armature=0.0, + ) + + # Override gear initial states for Rizon (closer to robot, centered) + self.scene.factory_gear_base.init_state = RigidObjectCfg.InitialStateCfg( + # pos=(-0.6, 0.0, -0.1), + # pos=(0.63, -0.083, -0.1), + pos=(0.481, -0.073, 0.071), + rot=(0.0, 0.0, 0.70711, -0.70711), + ) + self.scene.factory_gear_small.init_state = RigidObjectCfg.InitialStateCfg( + # pos=(-0.6, 0.0, -0.1), + # pos=(0.63, -0.083, -0.1), + pos=(0.481, -0.073, 0.071), + rot=(0.0, 0.0, 0.70711, -0.70711), + ) + self.scene.factory_gear_medium.init_state = RigidObjectCfg.InitialStateCfg( + # pos=(-0.6, 0.0, -0.1), + # pos=(0.63, -0.083, -0.1), + pos=(0.481, -0.073, 0.071), + rot=(0.0, 0.0, 0.70711, -0.70711), + ) + self.scene.factory_gear_large.init_state = RigidObjectCfg.InitialStateCfg( + # pos=(-0.6, 0.0, -0.1), + # pos=(0.63, -0.083, -0.1), + pos=(0.481, -0.073, 0.071), + rot=(0.0, 0.0, 0.70711, -0.70711), + ) + + # Gear offsets and grasp positions for Rizon 4s with Grav gripper + # These offsets are relative to the end effector frame (link7) + # Z offset accounts for the gripper length from link7 to finger tip + self.gear_offsets_grasp = { + "gear_small": [0.0, -self.gear_offsets["gear_small"][0], -0.35], + "gear_medium": [0.0, -self.gear_offsets["gear_medium"][0], -0.35], + "gear_large": [0.0, -self.gear_offsets["gear_large"][0], -0.35], + } + + # Grasp widths for Grav gripper (raw radian values for finger_joint) + self.hand_grasp_width = { + "gear_small": 0.05, + "gear_medium": 0.2, + "gear_large": 0.28, + } + + # Close widths for Grav gripper (raw radian values for finger_joint) + self.hand_close_width = { + "gear_small": 0.0, + "gear_medium": 0.139626, + "gear_large": 0.139626, + } + + # Populate event term parameters + self.events.set_robot_to_grasp_pose.params["gear_offsets_grasp"] = self.gear_offsets_grasp + self.events.set_robot_to_grasp_pose.params["end_effector_body_name"] = self.end_effector_body_name + self.events.set_robot_to_grasp_pose.params["num_arm_joints"] = self.num_arm_joints + self.events.set_robot_to_grasp_pose.params["grasp_rot_offset"] = self.grasp_rot_offset + self.events.set_robot_to_grasp_pose.params["gripper_joint_setter_func"] = self.gripper_joint_setter_func + + # Populate reward term parameters for EE-gear keypoint tracking + self.rewards.end_effector_base_keypoint_tracking.params["end_effector_body_name"] = self.end_effector_body_name + self.rewards.end_effector_base_keypoint_tracking.params["grasp_rot_offset"] = self.grasp_rot_offset + self.rewards.end_effector_base_keypoint_tracking.params["gear_offsets_grasp"] = self.gear_offsets_grasp + + self.rewards.end_effector_base_keypoint_tracking_exp.params["end_effector_body_name"] = ( + self.end_effector_body_name + ) + self.rewards.end_effector_base_keypoint_tracking_exp.params["grasp_rot_offset"] = self.grasp_rot_offset + self.rewards.end_effector_base_keypoint_tracking_exp.params["gear_offsets_grasp"] = self.gear_offsets_grasp + + # Populate termination term parameters + self.terminations.gear_dropped.params["gear_offsets_grasp"] = self.gear_offsets_grasp + self.terminations.gear_dropped.params["end_effector_body_name"] = self.end_effector_body_name + self.terminations.gear_dropped.params["grasp_rot_offset"] = self.grasp_rot_offset + + self.terminations.gear_orientation_exceeded.params["end_effector_body_name"] = self.end_effector_body_name + self.terminations.gear_orientation_exceeded.params["grasp_rot_offset"] = self.grasp_rot_offset + + +@configclass +class Rizon4sGearAssemblyEnvCfg_PLAY(Rizon4sGearAssemblyEnvCfg): + """Play configuration for Flexiv Rizon 4s gear assembly.""" + + def __post_init__(self): + # post init of parent + super().__post_init__() + # make a smaller scene for play + self.scene.num_envs = 50 + self.scene.env_spacing = 2.5 + # disable randomization for play + self.observations.policy.enable_corruption = False diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/ros_inference_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/ros_inference_env_cfg.py new file mode 100644 index 000000000000..ecaab0abd4ff --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/ros_inference_env_cfg.py @@ -0,0 +1,116 @@ +# Copyright (c) 2025-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +import math + +from isaaclab.assets import RigidObjectCfg +from isaaclab.utils import configclass + +from .joint_pos_env_cfg import Rizon4sGearAssemblyEnvCfg + + +@configclass +class Rizon4sGearAssemblyROSInferenceEnvCfg(Rizon4sGearAssemblyEnvCfg): + """ROS / Isaac Manipulator inference fields plus deployment alignment for NVIDIA Hubble Lab. + + This configuration: + - Exposes variables needed for ROS inference + - Overrides robot and gear initial poses for fixed/deterministic setup + - Aligns robot mounting pose with the Flexiv Rizon 4s installation at NVIDIA Hubble Lab + """ + + # Single source for base + all gear rigid bodies (Rizon: closer to robot, centered) + ros_inference_factory_gears_init_state: RigidObjectCfg.InitialStateCfg = RigidObjectCfg.InitialStateCfg( + pos=(0.938, 0.0826, -0.15), + rot=(0.0, 0.0, 0.70711, -0.70711), + ) + + def __post_init__(self): + # post init of parent + super().__post_init__() + + # Variables used by Isaac Manipulator for on robot inference + # These parameters allow the ROS inference node to validate environment configuration, + # perform checks during inference, and correctly interpret observations and actions. + self.obs_order = ["arm_dof_pos", "arm_dof_vel", "shaft_pos", "shaft_quat"] + self.policy_action_space = "joint" + # Use inherited joint names from parent's observation configuration + self.arm_joint_names = self.observations.policy.joint_pos.params["asset_cfg"].joint_names + # Use inherited num_arm_joints from parent + self.action_space = self.num_arm_joints + # State space and observation space for Rizon 4s with Grav gripper (7 DOF arm + 1 gripper) + # State: 7 joint pos + 7 joint vel + 3 shaft pos + 4 shaft quat + 3 gear pos + 4 gear quat = 28 + # For critic: additional gear observations + self.state_space = 28 + # Observation: 7 joint pos + 7 joint vel + 3 shaft pos + 4 shaft quat = 21 + self.observation_space = 21 + + # Set joint_action_scale from the existing arm_action.scale + self.joint_action_scale = self.actions.arm_action.scale + + # Dynamically generate action_scale_joint_space based on action_space + self.action_scale_joint_space = [self.joint_action_scale] * self.action_space + + # --- NVIDIA Hubble Lab: Flexiv Rizon 4s mount --- + # Remove vertical mount stand since Hubble deployment does not use the sim stand asset + self.scene.stand = None + + # Lab home joint pose (radians); aligns sim defaults / reset with the physical stand + self.scene.robot.init_state.joint_pos = { + "joint1": math.radians(-90.0), + "joint2": math.radians(90.0), + "joint3": 0.0, + "joint4": math.radians(90.0), + "joint5": 0.0, + "joint6": 0.0, + "joint7": 0.0, + } + + # Orientation of robot is based on the Flexiv Rizon 4s mount in the Hubble Lab + self.scene.robot.init_state.pos = (0.0, 0.0, 0.0) + self.scene.robot.init_state.rot = (0.5, 0.5, 0.5, 0.5) + + # Override gear base + all gears from one template (fixed pose for ROS inference) + _g = self.ros_inference_factory_gears_init_state + for _name in ( + "factory_gear_base", + "factory_gear_small", + "factory_gear_medium", + "factory_gear_large", + ): + getattr(self.scene, _name).init_state = RigidObjectCfg.InitialStateCfg( + pos=_g.pos, + rot=_g.rot, + lin_vel=_g.lin_vel, + ang_vel=_g.ang_vel, + ) + + # Fixed asset parameters for ROS inference - derived from configuration + # These parameters are used by the ROS inference node to validate the environment setup + # and apply appropriate noise models for robust real-world deployment. + # Derive position center from gear base init state + self.fixed_asset_init_pos_center = list(self.scene.factory_gear_base.init_state.pos) + # Derive position range from parent's randomize_gears_and_base_pose event pose_range + pose_range = self.events.randomize_gears_and_base_pose.params["pose_range"] + self.fixed_asset_init_pos_range = [ + pose_range["x"][1], # max value + pose_range["y"][1], # max value + pose_range["z"][1], # max value + ] + # Orientation in degrees (quaternion (0.0, 0.0, 0.70711, -0.70711) = -90° around Z) + self.fixed_asset_init_orn_deg = [0.0, 0.0, -90.0] + # Derive orientation range from parent's pose_range (radians to degrees) + self.fixed_asset_init_orn_deg_range = [ + math.degrees(pose_range["roll"][1]), # convert radians to degrees + math.degrees(pose_range["pitch"][1]), + math.degrees(pose_range["yaw"][1]), + ] + # Derive observation noise level from parent's gear_shaft_pos noise configuration + gear_shaft_pos_noise = self.observations.policy.gear_shaft_pos.noise.noise_cfg.n_max + self.fixed_asset_pos_obs_noise_level = [ + gear_shaft_pos_noise, + gear_shaft_pos_noise, + gear_shaft_pos_noise, + ] diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/gear_assembly_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/gear_assembly_env_cfg.py index 6a3fc7f080f4..49c201563773 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/gear_assembly_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/gear_assembly_env_cfg.py @@ -26,7 +26,10 @@ import isaaclab_tasks.manager_based.manipulation.deploy.mdp as mdp import isaaclab_tasks.manager_based.manipulation.deploy.mdp.terminations as gear_assembly_terminations -from isaaclab_tasks.manager_based.manipulation.deploy.mdp.noise_models import ResetSampledConstantNoiseModelCfg +from isaaclab_tasks.manager_based.manipulation.deploy.mdp.noise_models import ( + ResetSampledConstantNoiseModelCfg, + ResetSampledQuaternionNoiseModelCfg, +) # Get the directory where this configuration file is located CONFIG_DIR = os.path.dirname(os.path.abspath(__file__)) @@ -191,10 +194,17 @@ class PolicyCfg(ObsGroup): func=mdp.gear_shaft_pos_w, params={}, # Will be populated in __post_init__ noise=ResetSampledConstantNoiseModelCfg( - noise_cfg=UniformNoiseCfg(n_min=-0.005, n_max=0.005, operation="add") + noise_cfg=UniformNoiseCfg(n_min=-0.01, n_max=0.01, operation="add") + ), + ) + gear_shaft_quat = ObsTerm( + func=mdp.gear_shaft_quat_w, + noise=ResetSampledQuaternionNoiseModelCfg( + roll_range=(-0.03491, 0.03491), + pitch_range=(-0.03491, 0.03491), + yaw_range=(-0.03491, 0.03491), ), ) - gear_shaft_quat = ObsTerm(func=mdp.gear_shaft_quat_w) def __post_init__(self): self.enable_corruption = True @@ -263,7 +273,34 @@ class RewardsCfg: }, ) + end_effector_base_keypoint_tracking = RewTerm( + func=mdp.keypoint_ee_gear_error, + weight=-0.5, + params={ + "robot_asset_cfg": SceneEntityCfg("robot"), + "keypoint_scale": 0.15, + "ee_gear_threshold": 0.00, + "weight_ramp_start": 0.0, # Set to 0.0 to enable ramp-up + "weight_ramp_steps": 512_000, + }, + ) + + end_effector_base_keypoint_tracking_exp = RewTerm( + func=mdp.keypoint_ee_gear_error_exp, + weight=0.5, + params={ + "robot_asset_cfg": SceneEntityCfg("robot"), + "kp_exp_coeffs": [(50, 0.0001), (300, 0.0001)], + "kp_use_sum_of_exps": False, + "keypoint_scale": 0.15, + "ee_gear_threshold": 0.00, + "weight_ramp_start": 0.0, # Set to 0.0 to enable ramp-up + "weight_ramp_steps": 512_000, + }, + ) + action_rate = RewTerm(func=mdp.action_rate_l2, weight=-5.0e-06) + # action = RewTerm(func=mdp.action_l2, weight=-5.0e-06) @configclass diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/__init__.pyi b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/__init__.pyi index 9f27fc7560c6..c7ad54cf44e1 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/__init__.pyi +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/__init__.pyi @@ -17,6 +17,8 @@ __all__ = [ "keypoint_command_error_exp", "keypoint_entity_error", "keypoint_entity_error_exp", + "keypoint_ee_gear_error", + "keypoint_ee_gear_error_exp", "reset_when_gear_dropped", "reset_when_gear_orientation_exceeds_threshold", ] @@ -29,6 +31,8 @@ from .rewards import ( keypoint_command_error_exp, keypoint_entity_error, keypoint_entity_error_exp, + keypoint_ee_gear_error, + keypoint_ee_gear_error_exp, ) from .terminations import reset_when_gear_dropped, reset_when_gear_orientation_exceeds_threshold from isaaclab.envs.mdp import * diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/events.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/events.py index 503fd3af75d8..1daf69021b0d 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/events.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/events.py @@ -216,7 +216,7 @@ def __call__( robot_asset_cfg: SceneEntityCfg = SceneEntityCfg("robot"), pos_threshold: float = 1e-6, rot_threshold: float = 1e-6, - max_iterations: int = 10, + max_iterations: int = 50, pos_randomization_range: dict | None = None, gear_offsets_grasp: dict | None = None, end_effector_body_name: str | None = None, @@ -340,6 +340,22 @@ def __call__( # Update joint positions joint_pos = joint_pos + delta_dof_pos + + # Wrap arm joint positions to fall within robot's actual joint limits + joint_pos_limits = wp.to_torch(self.robot_asset.data.joint_pos_limits)[env_ids, : self.num_arm_joints, :] + joint_min = joint_pos_limits[:, :, 0] + joint_max = joint_pos_limits[:, :, 1] + joint_range = joint_max - joint_min + + # Wrap only the arm joint positions (not gripper joints) + arm_joint_pos = joint_pos[:, : self.num_arm_joints] + arm_joint_pos = torch.where( + joint_range > 0, + joint_min + torch.remainder(arm_joint_pos - joint_min, joint_range), + arm_joint_pos, + ) + joint_pos[:, : self.num_arm_joints] = arm_joint_pos + joint_vel = torch.zeros_like(joint_pos) # Write to sim @@ -348,6 +364,9 @@ def __call__( self.robot_asset.write_joint_position_to_sim_index(position=joint_pos, env_ids=env_ids) self.robot_asset.write_joint_velocity_to_sim_index(velocity=joint_vel, env_ids=env_ids) + # Reset joint velocities to zero after IK convergence + joint_vel = torch.zeros_like(wp.to_torch(self.robot_asset.data.joint_vel)[env_ids]) + # Set gripper to grasp position joint_pos = wp.to_torch(self.robot_asset.data.joint_pos)[env_ids].clone() diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/noise_models.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/noise_models.py index 2d5411e96977..1625890be668 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/noise_models.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/noise_models.py @@ -7,7 +7,12 @@ from __future__ import annotations -__all__ = ["ResetSampledConstantNoiseModel", "ResetSampledConstantNoiseModelCfg"] +__all__ = [ + "ResetSampledConstantNoiseModel", + "ResetSampledConstantNoiseModelCfg", + "ResetSampledQuaternionNoiseModel", + "ResetSampledQuaternionNoiseModelCfg", +] from collections.abc import Sequence from dataclasses import MISSING @@ -16,7 +21,8 @@ import torch from isaaclab.utils import configclass -from isaaclab.utils.noise import NoiseModel, NoiseModelCfg +from isaaclab.utils.math import quat_from_euler_xyz, quat_mul +from isaaclab.utils.noise import ConstantNoiseCfg, NoiseModel, NoiseModelCfg if TYPE_CHECKING: from isaaclab.utils.noise import NoiseCfg @@ -107,3 +113,77 @@ class ResetSampledConstantNoiseModelCfg(NoiseModelCfg): Based on this configuration, the noise is sampled at every reset of the noise model. """ + + +class ResetSampledQuaternionNoiseModel(NoiseModel): + """Noise model that applies a rotation perturbation to quaternion observations. + + At each episode reset, small Euler angle perturbations (roll, pitch, yaw) are sampled + uniformly from configurable ranges and converted to a perturbation quaternion. This + perturbation is then applied via quaternion multiplication at every step, producing a + geometrically valid rotated quaternion (unlike additive noise on raw components). + + The perturbation is held constant for the entire episode until the next reset. + """ + + def __init__(self, noise_model_cfg: NoiseModelCfg, num_envs: int, device: str): + super().__init__(noise_model_cfg, num_envs, device) + self._roll_range = noise_model_cfg.roll_range + self._pitch_range = noise_model_cfg.pitch_range + self._yaw_range = noise_model_cfg.yaw_range + # Identity quaternion in (x, y, z, w) format + self._perturbation_quat = torch.zeros((num_envs, 4), device=device) + self._perturbation_quat[:, 3] = 1.0 + + def reset(self, env_ids: Sequence[int] | None = None): + """Sample new rotation perturbations for the specified environments. + + Args: + env_ids: The environment ids to reset. Defaults to None (all environments). + """ + if env_ids is None: + env_ids = slice(None) + + num_resets = env_ids.stop - env_ids.start if isinstance(env_ids, slice) else len(env_ids) + + roll = torch.empty(num_resets, device=self._device).uniform_(*self._roll_range) + pitch = torch.empty(num_resets, device=self._device).uniform_(*self._pitch_range) + yaw = torch.empty(num_resets, device=self._device).uniform_(*self._yaw_range) + + self._perturbation_quat[env_ids] = quat_from_euler_xyz(roll, pitch, yaw) + + def __call__(self, data: torch.Tensor) -> torch.Tensor: + """Apply the pre-sampled rotation perturbation to the quaternion data. + + Args: + data: Quaternion observations in (x, y, z, w) format. Shape is (num_envs, 4). + + Returns: + Perturbed quaternions in (x, y, z, w) format. Shape is (num_envs, 4). + """ + return quat_mul(self._perturbation_quat, data) + + +@configclass +class ResetSampledQuaternionNoiseModelCfg(NoiseModelCfg): + """Configuration for a quaternion noise model that samples rotation perturbations at reset. + + The perturbation is specified as independent uniform ranges for roll, pitch, and yaw + (in radians). At each episode reset, Euler angles are sampled and converted to a + perturbation quaternion that is multiplied with the observed quaternion. + """ + + class_type: type = ResetSampledQuaternionNoiseModel + + noise_cfg: ConstantNoiseCfg = ConstantNoiseCfg(bias=0.0) + """Unused placeholder inherited from NoiseModelCfg. Quaternion perturbation is + controlled by roll_range, pitch_range, and yaw_range instead.""" + + roll_range: tuple[float, float] = (-0.03491, 0.03491) + """Uniform range for roll perturbation in radians. Default is ±2 degrees.""" + + pitch_range: tuple[float, float] = (-0.03491, 0.03491) + """Uniform range for pitch perturbation in radians. Default is ±2 degrees.""" + + yaw_range: tuple[float, float] = (-0.03491, 0.03491) + """Uniform range for yaw perturbation in radians. Default is ±2 degrees.""" diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/observations.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/observations.py index e70b77d7d8f9..92f933123f69 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/observations.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/observations.py @@ -186,7 +186,7 @@ def __call__( # Ensure w component is positive (q and -q represent the same rotation) # Pick one canonical form to reduce observation variation seen by the policy - w_negative = base_quat[:, 0] < 0 + w_negative = base_quat[:, 3] < 0 positive_quat = base_quat.clone() positive_quat[w_negative] = -base_quat[w_negative] @@ -335,7 +335,7 @@ def __call__(self, env: ManagerBasedRLEnv) -> torch.Tensor: # Ensure w component is positive (q and -q represent the same rotation) # Pick one canonical form to reduce observation variation seen by the policy - w_negative = gear_quat[:, 0] < 0 + w_negative = gear_quat[:, 3] < 0 gear_positive_quat = gear_quat.clone() gear_positive_quat[w_negative] = -gear_quat[w_negative] diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/rewards.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/rewards.py index 2204a0186bc0..560f796c70f3 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/rewards.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/mdp/rewards.py @@ -13,9 +13,10 @@ import warp as wp from isaaclab.managers import ManagerTermBase, RewardTermCfg, SceneEntityCfg -from isaaclab.utils.math import combine_frame_transforms +from isaaclab.utils.math import combine_frame_transforms, quat_apply, quat_mul if TYPE_CHECKING: + from isaaclab.assets import Articulation from isaaclab.envs import ManagerBasedRLEnv from isaaclab.sensors.frame_transformer.frame_transformer import FrameTransformer @@ -397,6 +398,318 @@ def __call__( return keypoint_reward_exp +class keypoint_ee_gear_error(ManagerTermBase): + """Compute keypoint distance between the robot end effector and the gear's grasp-corrected pose. + + Transforms the gear's actual world pose into the expected EE position/orientation + using grasp offsets, so that the distance is ~0 when properly holding the gear + and increases when the gripper drifts away. + + The reward is gated on the EE-gear keypoint distance: it only activates when + the mean keypoint error is below ``ee_gear_threshold``, so the penalty only + applies when the gripper is reasonably close to the gear. + + Supports linear weight ramp-up: the returned reward is scaled by a factor that + linearly increases from ``weight_ramp_start`` to 1.0 over ``weight_ramp_steps`` + env steps, allowing the reward to grow in importance as training progresses. + """ + + def __init__(self, cfg: RewardTermCfg, env: ManagerBasedRLEnv): + super().__init__(cfg, env) + + self.robot_asset_cfg: SceneEntityCfg = cfg.params.get("robot_asset_cfg", SceneEntityCfg("robot")) + self.robot_asset: Articulation = env.scene[self.robot_asset_cfg.name] + + self.end_effector_body_name: str = cfg.params["end_effector_body_name"] + grasp_rot_offset = cfg.params["grasp_rot_offset"] + self.grasp_rot_offset_tensor = ( + torch.tensor(grasp_rot_offset, device=env.device, dtype=torch.float32).unsqueeze(0).repeat(env.num_envs, 1) + ) + + gear_offsets_grasp = cfg.params["gear_offsets_grasp"] + self.gear_grasp_offsets_stacked = torch.stack( + [ + torch.tensor(gear_offsets_grasp["gear_small"], device=env.device, dtype=torch.float32), + torch.tensor(gear_offsets_grasp["gear_medium"], device=env.device, dtype=torch.float32), + torch.tensor(gear_offsets_grasp["gear_large"], device=env.device, dtype=torch.float32), + ], + dim=0, + ) + + self.weight_ramp_start: float = cfg.params.get("weight_ramp_start", 0.0) + self.weight_ramp_steps: int = cfg.params.get("weight_ramp_steps", 1) + self.ee_gear_threshold: float = cfg.params.get("ee_gear_threshold", 0.05) + + self.gear_type_indices = torch.zeros(env.num_envs, device=env.device, dtype=torch.long) + self.env_indices = torch.arange(env.num_envs, device=env.device) + self.gear_assets = { + "gear_small": env.scene["factory_gear_small"], + "gear_medium": env.scene["factory_gear_medium"], + "gear_large": env.scene["factory_gear_large"], + } + + eef_indices, _ = self.robot_asset.find_bodies([self.end_effector_body_name]) + self.eef_idx = eef_indices[0] if len(eef_indices) > 0 else None + + self.keypoint_computer = _compute_keypoint_distance(cfg, env) + self._step_count = 0 + + def _get_weight_scale(self, env: ManagerBasedRLEnv) -> float: + progress = min(env.common_step_counter / max(self.weight_ramp_steps, 1), 1.0) + return self.weight_ramp_start + (1.0 - self.weight_ramp_start) * progress + + def __call__( + self, + env: ManagerBasedRLEnv, + robot_asset_cfg: SceneEntityCfg = SceneEntityCfg("robot"), + end_effector_body_name: str = "", + grasp_rot_offset: list | None = None, + gear_offsets_grasp: dict | None = None, + keypoint_scale: float = 1.0, + add_cube_center_kp: bool = True, + weight_ramp_start: float = 0.0, + weight_ramp_steps: int = 1, + ee_gear_threshold: float = 0.05, + ) -> torch.Tensor: + if self.eef_idx is None: + return torch.zeros(env.num_envs, device=env.device) + + gear_type_manager: randomize_gear_type = env._gear_type_manager + self.gear_type_indices = gear_type_manager.get_all_gear_type_indices() + + eef_pos = wp.to_torch(self.robot_asset.data.body_link_pos_w)[:, self.eef_idx] + eef_quat = wp.to_torch(self.robot_asset.data.body_link_quat_w)[:, self.eef_idx] + + all_gear_pos = torch.stack( + [ + wp.to_torch(self.gear_assets["gear_small"].data.root_link_pos_w), + wp.to_torch(self.gear_assets["gear_medium"].data.root_link_pos_w), + wp.to_torch(self.gear_assets["gear_large"].data.root_link_pos_w), + ], + dim=1, + ) + all_gear_quat = torch.stack( + [ + wp.to_torch(self.gear_assets["gear_small"].data.root_link_quat_w), + wp.to_torch(self.gear_assets["gear_medium"].data.root_link_quat_w), + wp.to_torch(self.gear_assets["gear_large"].data.root_link_quat_w), + ], + dim=1, + ) + + gear_pos = all_gear_pos[self.env_indices, self.gear_type_indices] + gear_quat = all_gear_quat[self.env_indices, self.gear_type_indices] + + # -- EE-gear grasp quality reward -- + gear_quat_grasp = quat_mul(gear_quat, self.grasp_rot_offset_tensor) + grasp_offsets = self.gear_grasp_offsets_stacked[self.gear_type_indices] + gear_grasp_pos = gear_pos + quat_apply(gear_quat_grasp, grasp_offsets) + + keypoint_dist_sep = self.keypoint_computer.compute( + current_pos=eef_pos, + current_quat=eef_quat, + target_pos=gear_grasp_pos, + target_quat=gear_quat_grasp, + keypoint_scale=keypoint_scale, + ) + + mean_kp_error = keypoint_dist_sep.mean(-1) + + # Gate on EE-gear distance: only penalize when gripper is close to gear + is_close = (mean_kp_error > self.ee_gear_threshold).float() + + weight_scale = self._get_weight_scale(env) + scaled_reward = mean_kp_error * weight_scale * is_close + + mean_error_scalar = mean_kp_error.mean().item() + pct_close = is_close.mean().item() + + if not hasattr(env, "extras"): + env.extras = {} + if "log" not in env.extras: + env.extras["log"] = {} + env.extras["log"]["ee_gear_kp_error/mean_keypoint_dist"] = mean_error_scalar + env.extras["log"]["ee_gear_kp_error/pct_envs_close"] = pct_close + env.extras["log"]["ee_gear_kp_error/weight_scale"] = weight_scale + + self._step_count += 1 + import carb + + carb.log_info( + f"[ee_gear_kp_error] step={self._step_count}" + f" | mean_kp_error={mean_error_scalar:.5f}" + f" | pct_close={pct_close:.3f}" + f" | weight_scale={weight_scale:.4f}" + ) + + return scaled_reward + + +class keypoint_ee_gear_error_exp(ManagerTermBase): + """Compute exponential keypoint reward between the robot end effector and the gear's grasp-corrected pose. + + Transforms the gear's actual world pose into the expected EE position/orientation + using grasp offsets, so that the reward is high (~1) when properly holding the gear + and drops sharply when the gripper drifts away. + + The reward is gated on the EE-gear keypoint distance: it only activates when + the mean keypoint error is below ``ee_gear_threshold``, so the reward only + applies when the gripper is reasonably close to the gear. + + Supports linear weight ramp-up: the returned reward is scaled by a factor that + linearly increases from ``weight_ramp_start`` to 1.0 over ``weight_ramp_steps`` + env steps, allowing the reward to grow in importance as training progresses. + """ + + def __init__(self, cfg: RewardTermCfg, env: ManagerBasedRLEnv): + super().__init__(cfg, env) + + self.robot_asset_cfg: SceneEntityCfg = cfg.params.get("robot_asset_cfg", SceneEntityCfg("robot")) + self.robot_asset: Articulation = env.scene[self.robot_asset_cfg.name] + + self.end_effector_body_name: str = cfg.params["end_effector_body_name"] + grasp_rot_offset = cfg.params["grasp_rot_offset"] + self.grasp_rot_offset_tensor = ( + torch.tensor(grasp_rot_offset, device=env.device, dtype=torch.float32).unsqueeze(0).repeat(env.num_envs, 1) + ) + + gear_offsets_grasp = cfg.params["gear_offsets_grasp"] + self.gear_grasp_offsets_stacked = torch.stack( + [ + torch.tensor(gear_offsets_grasp["gear_small"], device=env.device, dtype=torch.float32), + torch.tensor(gear_offsets_grasp["gear_medium"], device=env.device, dtype=torch.float32), + torch.tensor(gear_offsets_grasp["gear_large"], device=env.device, dtype=torch.float32), + ], + dim=0, + ) + + self.weight_ramp_start: float = cfg.params.get("weight_ramp_start", 0.0) + self.weight_ramp_steps: int = cfg.params.get("weight_ramp_steps", 1) + self.ee_gear_threshold: float = cfg.params.get("ee_gear_threshold", 0.05) + + self.gear_type_indices = torch.zeros(env.num_envs, device=env.device, dtype=torch.long) + self.env_indices = torch.arange(env.num_envs, device=env.device) + self.gear_assets = { + "gear_small": env.scene["factory_gear_small"], + "gear_medium": env.scene["factory_gear_medium"], + "gear_large": env.scene["factory_gear_large"], + } + + eef_indices, _ = self.robot_asset.find_bodies([self.end_effector_body_name]) + self.eef_idx = eef_indices[0] if len(eef_indices) > 0 else None + + self.keypoint_computer = _compute_keypoint_distance(cfg, env) + self._step_count = 0 + + def _get_weight_scale(self, env: ManagerBasedRLEnv) -> float: + progress = min(env.common_step_counter / max(self.weight_ramp_steps, 1), 1.0) + return self.weight_ramp_start + (1.0 - self.weight_ramp_start) * progress + + def __call__( + self, + env: ManagerBasedRLEnv, + robot_asset_cfg: SceneEntityCfg = SceneEntityCfg("robot"), + end_effector_body_name: str = "", + grasp_rot_offset: list | None = None, + gear_offsets_grasp: dict | None = None, + kp_exp_coeffs: list[tuple[float, float]] = [(1.0, 0.1)], + kp_use_sum_of_exps: bool = True, + keypoint_scale: float = 1.0, + add_cube_center_kp: bool = True, + weight_ramp_start: float = 0.0, + weight_ramp_steps: int = 1, + ee_gear_threshold: float = 0.05, + ) -> torch.Tensor: + if self.eef_idx is None: + return torch.zeros(env.num_envs, device=env.device) + + gear_type_manager: randomize_gear_type = env._gear_type_manager + self.gear_type_indices = gear_type_manager.get_all_gear_type_indices() + + eef_pos = wp.to_torch(self.robot_asset.data.body_link_pos_w)[:, self.eef_idx] + eef_quat = wp.to_torch(self.robot_asset.data.body_link_quat_w)[:, self.eef_idx] + + all_gear_pos = torch.stack( + [ + wp.to_torch(self.gear_assets["gear_small"].data.root_link_pos_w), + wp.to_torch(self.gear_assets["gear_medium"].data.root_link_pos_w), + wp.to_torch(self.gear_assets["gear_large"].data.root_link_pos_w), + ], + dim=1, + ) + all_gear_quat = torch.stack( + [ + wp.to_torch(self.gear_assets["gear_small"].data.root_link_quat_w), + wp.to_torch(self.gear_assets["gear_medium"].data.root_link_quat_w), + wp.to_torch(self.gear_assets["gear_large"].data.root_link_quat_w), + ], + dim=1, + ) + + gear_pos = all_gear_pos[self.env_indices, self.gear_type_indices] + gear_quat = all_gear_quat[self.env_indices, self.gear_type_indices] + + # -- EE-gear grasp quality reward -- + gear_quat_grasp = quat_mul(gear_quat, self.grasp_rot_offset_tensor) + grasp_offsets = self.gear_grasp_offsets_stacked[self.gear_type_indices] + gear_grasp_pos = gear_pos + quat_apply(gear_quat_grasp, grasp_offsets) + + keypoint_dist_sep = self.keypoint_computer.compute( + current_pos=eef_pos, + current_quat=eef_quat, + target_pos=gear_grasp_pos, + target_quat=gear_quat_grasp, + keypoint_scale=keypoint_scale, + ) + + mean_kp_error = keypoint_dist_sep.mean(-1) + + # Gate on EE-gear distance: only reward when gripper is close to gear + is_close = (mean_kp_error > self.ee_gear_threshold).float() + + keypoint_reward_exp = torch.zeros_like(keypoint_dist_sep[:, 0]) + if kp_use_sum_of_exps: + for coeff in kp_exp_coeffs: + a, b = coeff + keypoint_reward_exp += ( + 1.0 / (torch.exp(a * keypoint_dist_sep) + b + torch.exp(-a * keypoint_dist_sep)) + ).mean(-1) + else: + kp_dist_mean = keypoint_dist_sep.mean(-1) + for coeff in kp_exp_coeffs: + a, b = coeff + keypoint_reward_exp += 1.0 / (torch.exp(a * kp_dist_mean) + b + torch.exp(-a * kp_dist_mean)) + + weight_scale = self._get_weight_scale(env) + scaled_reward = keypoint_reward_exp * weight_scale * is_close + + mean_error_scalar = mean_kp_error.mean().item() + mean_reward_scalar = keypoint_reward_exp.mean().item() + pct_close = is_close.mean().item() + + if not hasattr(env, "extras"): + env.extras = {} + if "log" not in env.extras: + env.extras["log"] = {} + env.extras["log"]["ee_gear_kp_error_exp/mean_keypoint_dist"] = mean_error_scalar + env.extras["log"]["ee_gear_kp_error_exp/mean_exp_reward"] = mean_reward_scalar + env.extras["log"]["ee_gear_kp_error_exp/pct_envs_close"] = pct_close + env.extras["log"]["ee_gear_kp_error_exp/weight_scale"] = weight_scale + + self._step_count += 1 + import carb + + carb.log_info( + f"[ee_gear_kp_error_exp] step={self._step_count}" + f" | mean_kp_error={mean_error_scalar:.5f}" + f" | pct_close={pct_close:.3f}" + f" | weight_scale={weight_scale:.4f}" + f" | mean_exp_reward={mean_reward_scalar:.5f}" + ) + + return scaled_reward + + ## # Helper functions and classes ##