Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
Abstract
Force prompts enable video generation models to simulate realistic physical interactions using pretrained models and force conditioning from Blender-generated videos.
Recent advances in video generation models have sparked interest in world models capable of simulating realistic environments. While navigation has been well-explored, physically meaningful interactions that mimic real-world forces remain largely understudied. In this work, we investigate using physical forces as a control signal for video generation and propose force prompts which enable users to interact with images through both localized point forces, such as poking a plant, and global wind force fields, such as wind blowing on fabric. We demonstrate that these force prompts can enable videos to respond realistically to physical control signals by leveraging the visual and motion prior in the original pretrained model, without using any 3D asset or physics simulator at inference. The primary challenge of force prompting is the difficulty in obtaining high quality paired force-video training data, both in the real world due to the difficulty of obtaining force signals, and in synthetic data due to limitations in the visual quality and domain diversity of physics simulators. Our key finding is that video generation models can generalize remarkably well when adapted to follow physical force conditioning from videos synthesized by Blender, even with limited demonstrations of few objects. Our method can generate videos which simulate forces across diverse geometries, settings, and materials. We also try to understand the source of this generalization and perform ablations that reveal two key elements: visual diversity and the use of specific text keywords during training. Our approach is trained on only around 15k training examples for a single day on four A100 GPUs, and outperforms existing methods on force adherence and physics realism, bringing world models closer to real-world physics interactions. We release all datasets, code, weights, and interactive video demos at our project page.
Community
We release all datasets, code, weights, and interactive video demos at our project page: https://force-prompting.github.io/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM (2025)
- WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions (2025)
- VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior (2025)
- ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction (2025)
- Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments (2025)
- DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment (2025)
- TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper