ATI: Any Trajectory Instruction for Controllable Video Generation
Abstract
A unified framework for video motion control integrates camera movement, object translation, and local motion via trajectory-based inputs, improving controllability and visual quality.
We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion using trajectory-based inputs. In contrast to prior methods that address these motion types through separate modules or task-specific designs, our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models via a lightweight motion injector. Users can specify keypoints and their motion paths to control localized deformations, entire object motion, virtual camera dynamics, or combinations of these. The injected trajectory signals guide the generative process to produce temporally consistent and semantically aligned motion sequences. Our framework demonstrates superior performance across multiple video motion control tasks, including stylized motion effects (e.g., motion brushes), dynamic viewpoint changes, and precise local motion manipulation. Experiments show that our method provides significantly better controllability and visual quality compared to prior approaches and commercial solutions, while remaining broadly compatible with various state-of-the-art video generation backbones. Project page: https://anytraj.github.io/.
Community
ATI is a trajectory-based motion control framework that unifies object, local and camera movements in video generation.
Website: https://anytraj.github.io/ Github: https://github.com/bytedance/ATI Hugging Face: https://huggingface.co/bytedance-research/ATIThis is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MotionPro: A Precise Motion Controller for Image-to-Video Generation (2025)
- TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation (2025)
- OmniCam: Unified Multimodal Video Generation via Camera Control (2025)
- ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction (2025)
- CamContextI2V: Context-aware Controllable Video Generation (2025)
- DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment (2025)
- AnimateAnywhere: Rouse the Background in Human Image Animation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper