Deformable Sprites

for Unsupervised Video Decomposition

CVPR 2022 (Oral)

¹ UC Berkeley
² Google Research

Paper
Code

Given an RGB video and its optical flow, we decompose the video into layers of persistent motion groups without any initial mask or user input. The resulting decomposition captures long-term correspondences of sprites over time, enabling effects such as propagating sprite edits across the entire video.

Abstract

We describe a method to extract persistent elements of a dynamic scene from an input video. We represent each scene element as a Deformable Sprite consisting of three components: 1) a 2D texture image for the entire video, 2) per-frame masks for the element, and 3) non-rigid deformations that map the texture image into each video frame. The resulting decomposition allows for applications such as consistent video editing. Deformable Sprites are a type of video auto-encoder model that is optimized on individual videos, and does not require training on a large dataset, nor does it rely on pre-trained models. Moreover, our method does not require object masks or other user input, and discovers moving objects of a wider variety than previous work. We evaluate our approach on standard video datasets and show qualitative results on a diverse array of Internet videos.

Video

BibTeX

@inproceedings{ye2022sprites,
	title = {Deformable Sprites for Unsupervised Video Decomposition},
	author = {Ye, Vickie and Li, Zhengqi and Tucker, Richard and Kanazawa, Angjoo and Snavely, Noah},
	booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
	month = {June},
	year = {2022}
}

Deformable Sprites

for Unsupervised Video Decomposition

CVPR 2022 (Oral)

Abstract

Video

BibTeX

Video Decompositions

Internet Videos

DAVIS Videos

Applications

Consistent Video Editing

Motion Sculptures