Image Conductor: Precision Control for Interactive Video Synthesis

¹Peking University, ²ARC Lab, Tencent PCG, ³Nanyang Technological University, ⁴Tsinghua University, ⁵University of Macau, ⁶Shenzhen Institute of Advanced Technology
^‡Project lead. ^✉Corresponding author.

Abstract

Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Extensive experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis.

Methods

To address the lack of large-scale annotated video data, we first design a data construction pipeline to create a consistent video dataset with appropriate motion. Then, an well-cultivated two-stage training scheme is proposed to derive motion-controllable video ControlNet [1] [2], which can separate camera tansitions and object movements based on distinct lora weights.

Selected Visual Results

Image Conductor enables the controllable animation of static images by precisely directing camera transitions and object movements according to user specifications, resulting in coherent video assets. Additionally, pre-trained weights from the open-source community [3] can be loaded to achieve personalized generation, similar to Animatediff [4].

Note: The prompt displayed below is a simplified version.