SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy
[ArXiv], [Paper], [Project Page]
Due to IP policies, we do not release a click-and-run version of SI-Diff. We will provide more supplementary details to the paper to help you reproduce the work.
We first provide a straightforward introduction to the fundamentals of robot control to help readers avoid confusion. The second-order dynamic
model of an n-Degree-of-Freedom torque-controlled robot is as follows:
Among these terms, we control the robot by changing
Based on this, if a feedforward force term is added, the controller becomes a feedforward force-based impedance controller, which is the controller used in this work.
Our force diffusion policy learns how to predict the feedforward force.
Note that we rely on the error term e to drive the end effector (EE) to the desired position. In other words, we need to first define a desired position or trajectory. Although the feedforward force can also influence the motion of the EE, we only rely on it to handle misalignment or sticking situations.
First, you need to build an impedance controller for your robot. If you are using a Franka Robotics robot, you can follow this demo. Once this step is completed, your robot should behave like the one shown in the following video.
595577416-4ef82801-d471-4a69-8b65-04aa87ca3d07.mp4
On top of the impedance controller, you need to further add a feedforward force term to the controller. You can start by designing the feedforward force using a simple pattern. For example, you can set fz as a sinusoidal signal and set fx, fy, mx, my, and mz to zero. Then, your robot should behave as shown in the following video.
595577540-e5ae456f-881e-4a4d-90d8-ec9e37ff4f6c.mov
Follow Algorithm 1 in our paper to design the teacher policy and collect training data. We provide one demonstration (robot_action.pkl & robot_state.pkl) in this repository to show what the training data look like.
Our diffusion policy learns to predict robot action (output) from robot states (input). The action is the 6 DoF feedforward force (fx, fy, fz, mx, my, and mz). The robot state is 37-dimensional: the first value is the mode prompt, and the following 36 dimensions are identical to the observations in TacDiffusion. You can refer to the discussion here for details regarding the 36 dimension values.
Once the teacher policy is ready, the robot can start searching. In the early stages, we manually created misalignments to collect data for the teacher policy. Later, we developed an automated data collection pipeline. It mirrors the evaluation process of the teacher policy, but only records successful demonstrations that meet our efficiency criteria (completed within 2 seconds). We kept running this until a sufficient number of expert demonstrations are collected.
auto_data.mp4
Our diffusion policy is built upon Imitating-Human-Behaviour-w-Diffusion and TacDiffusion. We recommend first becoming familiar with these two works, then following the instructions in our paper to add the mode embedding layers.
Since the model needs to learn two modes simultaneously, and the data distribution between the two modes is imbalanced, we recommend using the BBS technique. The following code briefly illustrates one training iteration process.
for ep in range(n_epoch):
dataload_train_0.sampler.set_epoch(ep)
dataload_train_1.sampler.set_epoch(ep)
model.train()
optim.param_groups[0]["lr"] = lrate * ((np.cos((ep / n_epoch) * np.pi) + 1) / 2)
pbar = zip(dataload_train_0, dataload_train_1)
if rank == 0:
pbar = tqdm(pbar, total=min(len(dataload_train_0), len(dataload_train_1)), desc=f"Epoch {ep}")
for (x0, y0), (x1, y1) in pbar:
# 1. Move tensors to the configured device asynchronously
x0 = x0.to(device, non_blocking=True).float()
y0 = y0.to(device, non_blocking=True).float()
x1 = x1.to(device, non_blocking=True).float()
y1 = y1.to(device, non_blocking=True).float()
# 2. Extract the mode prompt from the first dimension (index 0)
# Input shape: [B, 37] -> mode shape: [B], feature shape: [B, 36]
mode0 = x0[:, 0].long() # Cast to long for the embedding layer
x0_feature = x0[:, 1:] # Slice the remaining 36 dimensions for observations
mode1 = x1[:, 0].long()
x1_feature = x1[:, 1:]
# 3. Concatenate the dual-source data into a single balanced batch
x_batch = torch.cat([x0_feature, x1_feature], dim=0) # Pure 36-dim observations
y_batch = torch.cat([y0, y1], dim=0)
mode_batch = torch.cat([mode0, mode1], dim=0) # Combined mode prompts
# 4. Forward pass and loss computation
loss = model.module.loss_on_batch(x_batch, y_batch, mode_batch)
# 5. Backward pass and optimization step
optim.zero_grad()
loss.backward()
optim.step()
if rank == 0:
pbar.set_description(f"train loss: {loss.item():.4f}")
writer.add_scalar('training_loss', loss.item(), global_step)
global_step += 1Parts of this project page were adopted from the Nerfies page. We would like to thank the authors of Imitating-Human-Behaviour-w-Diffusion and TacDiffusion for their open-source contributions.