Skip to content

yorklyb/SI-Diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

116 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy

RA-L'26 & ICRA'27

[ArXiv], [Paper], [Project Page]

From the Author

Due to IP policies, we do not release a click-and-run version of SI-Diff. We will provide more supplementary details to the paper to help you reproduce the work.

We first provide a straightforward introduction to the fundamentals of robot control to help readers avoid confusion. The second-order dynamic model of an n-Degree-of-Freedom torque-controlled robot is as follows:
image
Among these terms, we control the robot by changing $\boldsymbol{\tau}_m$, the joint torque. If the following algorithm is used, the robot is controlled by an impedance controller.
image
Based on this, if a feedforward force term is added, the controller becomes a feedforward force-based impedance controller, which is the controller used in this work.
image
Our force diffusion policy learns how to predict the feedforward force.

Note that we rely on the error term e to drive the end effector (EE) to the desired position. In other words, we need to first define a desired position or trajectory. Although the feedforward force can also influence the motion of the EE, we only rely on it to handle misalignment or sticking situations.

Step 1: Impedance Controller

First, you need to build an impedance controller for your robot. If you are using a Franka Robotics robot, you can follow this demo. Once this step is completed, your robot should behave like the one shown in the following video.

595577416-4ef82801-d471-4a69-8b65-04aa87ca3d07.mp4

Step 2: Feedforward-based Impedance Controller

On top of the impedance controller, you need to further add a feedforward force term to the controller. You can start by designing the feedforward force using a simple pattern. For example, you can set fz as a sinusoidal signal and set fx, fy, mx, my, and mz to zero. Then, your robot should behave as shown in the following video.

595577540-e5ae456f-881e-4a4d-90d8-ec9e37ff4f6c.mov

Step 3: Teacher Policy

Follow Algorithm 1 in our paper to design the teacher policy and collect training data. We provide one demonstration (robot_action.pkl & robot_state.pkl) in this repository to show what the training data look like.

Our diffusion policy learns to predict robot action (output) from robot states (input). The action is the 6 DoF feedforward force (fx, fy, fz, mx, my, and mz). The robot state is 37-dimensional: the first value is the mode prompt, and the following 36 dimensions are identical to the observations in TacDiffusion. You can refer to the discussion here for details regarding the 36 dimension values.

Once the teacher policy is ready, the robot can start searching. In the early stages, we manually created misalignments to collect data for the teacher policy. Later, we developed an automated data collection pipeline. It mirrors the evaluation process of the teacher policy, but only records successful demonstrations that meet our efficiency criteria (completed within 2 seconds). We kept running this until a sufficient number of expert demonstrations are collected.

auto_data.mp4

Step 4: Diffusion Policy

Our diffusion policy is built upon Imitating-Human-Behaviour-w-Diffusion and TacDiffusion. We recommend first becoming familiar with these two works, then following the instructions in our paper to add the mode embedding layers.
network

Step 5: Model Training

Since the model needs to learn two modes simultaneously, and the data distribution between the two modes is imbalanced, we recommend using the BBS technique. The following code briefly illustrates one training iteration process.

for ep in range(n_epoch):
    dataload_train_0.sampler.set_epoch(ep)
    dataload_train_1.sampler.set_epoch(ep)

    model.train()
    optim.param_groups[0]["lr"] = lrate * ((np.cos((ep / n_epoch) * np.pi) + 1) / 2)

    pbar = zip(dataload_train_0, dataload_train_1)
    if rank == 0:
        pbar = tqdm(pbar, total=min(len(dataload_train_0), len(dataload_train_1)), desc=f"Epoch {ep}")

    for (x0, y0), (x1, y1) in pbar:
        # 1. Move tensors to the configured device asynchronously
        x0 = x0.to(device, non_blocking=True).float()
        y0 = y0.to(device, non_blocking=True).float()
        x1 = x1.to(device, non_blocking=True).float()
        y1 = y1.to(device, non_blocking=True).float()

        # 2. Extract the mode prompt from the first dimension (index 0)
        # Input shape: [B, 37] -> mode shape: [B], feature shape: [B, 36]
        mode0 = x0[:, 0].long()       # Cast to long for the embedding layer
        x0_feature = x0[:, 1:]        # Slice the remaining 36 dimensions for observations

        mode1 = x1[:, 0].long()
        x1_feature = x1[:, 1:]

        # 3. Concatenate the dual-source data into a single balanced batch
        x_batch = torch.cat([x0_feature, x1_feature], dim=0)  # Pure 36-dim observations
        y_batch = torch.cat([y0, y1], dim=0)
        mode_batch = torch.cat([mode0, mode1], dim=0)          # Combined mode prompts

        # 4. Forward pass and loss computation
        loss = model.module.loss_on_batch(x_batch, y_batch, mode_batch)
        
        # 5. Backward pass and optimization step
        optim.zero_grad()
        loss.backward()
        optim.step()

        if rank == 0:
            pbar.set_description(f"train loss: {loss.item():.4f}")
            writer.add_scalar('training_loss', loss.item(), global_step)
            global_step += 1

Acknowledgments

Parts of this project page were adopted from the Nerfies page. We would like to thank the authors of Imitating-Human-Behaviour-w-Diffusion and TacDiffusion for their open-source contributions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors