Fix: per-frame timestep allocation for video2world (image2world) mode in CosmosPredict2 by csy2077 · Pull Request #26 · NVlabs/FastGen

csy2077 · 2026-05-16T14:29:46Z

Problem

In CosmosPredict2.forward(), when running in video2world (image2world) mode, the conditioning (clean) frames were not assigned a special timestep of 0.0. Instead, the model received the same noisy timestep for all frames — including the clean conditioning frame(s). This caused the model to treat the clean first frame as a fully-noised frame, leading to:

Dramatic quality degradation in generated videos
No temporal coherence to the conditioning frame
Effectively broken video2world / image2world distillation

Fix

After replacing the conditioning frames in model_input, expand t to per-frame shape (B, T) and zero out the timestep for conditioning frames (indicated by condition_mask). This tells the transformer that those frames are already clean and require no denoising:

t_expanded = t.unsqueeze(1).expand(B, T)
mask_B_T = condition_mask[:, 0, :, 0, 0]  # (B, T)
t = t_expanded * (1 - mask_B_T)

The transformer already accepts timesteps_B_T of shape (B, T), so no other changes are needed.

Impact

Without this fix, CosmosPredict2 video2world distillation produces incoherent videos that ignore the conditioning frame. With this fix, the model correctly preserves the conditioning frame and generates temporally consistent video.

In video2world (image2world) mode, conditioning frames were receiving the same noisy timestep as all other frames. This caused the transformer to treat the clean conditioning frame as a fully-noised input, breaking temporal coherence and causing severe quality degradation. Fix: after replacing conditioning frames in model_input, expand t to shape (B, T) and zero out timesteps for frames where condition_mask=1, signaling to the model that those frames are already clean (t=0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: per-frame timestep allocation for video2world (image2world) mode in CosmosPredict2#26

Fix: per-frame timestep allocation for video2world (image2world) mode in CosmosPredict2#26
csy2077 wants to merge 1 commit into
NVlabs:mainfrom
csy2077:fix/cosmos-predict2-video2world-per-frame-timestep

csy2077 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

csy2077 commented May 16, 2026

Problem

Fix

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant