Image Conductor: Precision Control for Interactive Video Synthesis

1Peking University, 2ARC Lab, Tencent PCG, 3Nanyang Technological University, 4Tsinghua University, 5University of Macau, 6Shenzhen Institute of Advanced Technology
Project lead. Corresponding author.

Abstract

Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Extensive experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis.

Methods

To address the lack of large-scale annotated video data, we first design a data construction pipeline to create a consistent video dataset with appropriate motion. Then, an well-cultivated two-stage training scheme is proposed to derive motion-controllable video ControlNet [1] [2], which can separate camera tansitions and object movements based on distinct lora weights.

Selected Visual Results

Image Conductor enables the controllable animation of static images by precisely directing camera transitions and object movements according to user specifications, resulting in coherent video assets. Additionally, pre-trained weights from the open-source community [3] can be loaded to achieve personalized generation, similar to Animatediff [4].

Note: The prompt displayed below is a simplified version.

object movements
teaser
Prompt: "A red rose engulfed in flames."
teaser
Prompt: "a sea turtle gracefully swimming over a coral reef."
teaser
Prompt: "A honeybee flies to the sunflower."
teaser
Prompt: "Glowing jellyfish mushroom."
camera tansitions
teaser
Prompt: "Tusuncub with its mouth open."
teaser
Prompt: "1girl, short wavy hair, sunflower."
teaser
Prompt: "Blooming cherry blossom."
teaser
Prompt: "A oil painting."