Deformable Sprites

for Unsupervised Video Decomposition

CVPR 2022 (Oral)

  • 1 UC Berkeley
  • 2 Google Research

Given an RGB video and its optical flow, we decompose the video into layers of persistent motion groups without any initial mask or user input. The resulting decomposition captures long-term correspondences of sprites over time, enabling effects such as propagating sprite edits across the entire video.


We describe a method to extract persistent elements of a dynamic scene from an input video. We represent each scene element as a Deformable Sprite consisting of three components: 1) a 2D texture image for the entire video, 2) per-frame masks for the element, and 3) non-rigid deformations that map the texture image into each video frame. The resulting decomposition allows for applications such as consistent video editing. Deformable Sprites are a type of video auto-encoder model that is optimized on individual videos, and does not require training on a large dataset, nor does it rely on pre-trained models. Moreover, our method does not require object masks or other user input, and discovers moving objects of a wider variety than previous work. We evaluate our approach on standard video datasets and show qualitative results on a diverse array of Internet videos.



Video Decompositions

Internet Videos

DAVIS Videos


Consistent Video Editing

Motion Sculptures