Update README.md
Browse files
README.md
CHANGED
@@ -4,4 +4,190 @@ language:
|
|
4 |
- en
|
5 |
base_model:
|
6 |
- ali-vilab/VACE-Wan2.1-1.3B-Preview
|
7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- en
|
5 |
base_model:
|
6 |
- ali-vilab/VACE-Wan2.1-1.3B-Preview
|
7 |
+
---
|
8 |
+
|
9 |
+
<p align="center">
|
10 |
+
|
11 |
+
<h1 align="center">VACE: All-in-One Video Creation and Editing</h1>
|
12 |
+
<p align="center">
|
13 |
+
<strong>Zeyinzi Jiang<sup>*</sup></strong>
|
14 |
+
·
|
15 |
+
<strong>Zhen Han<sup>*</sup></strong>
|
16 |
+
·
|
17 |
+
<strong>Chaojie Mao<sup>*†</sup></strong>
|
18 |
+
·
|
19 |
+
<strong>Jingfeng Zhang</strong>
|
20 |
+
·
|
21 |
+
<strong>Yulin Pan</strong>
|
22 |
+
·
|
23 |
+
<strong>Yu Liu</strong>
|
24 |
+
<br>
|
25 |
+
<b>Tongyi Lab - <a href="https://github.com/Wan-Video/Wan2.1"><img src='https://ali-vilab.github.io/VACE-Page/assets/logos/wan_logo.png' alt='wan_logo' style='margin-bottom: -4px; height: 20px;'></a> </b>
|
26 |
+
<br>
|
27 |
+
<br>
|
28 |
+
<a href="https://arxiv.org/abs/2503.07598"><img src='https://img.shields.io/badge/arXiv-VACE-red' alt='Paper PDF'></a>
|
29 |
+
<a href="https://ali-vilab.github.io/VACE-Page/"><img src='https://img.shields.io/badge/Project_Page-VACE-green' alt='Project Page'></a>
|
30 |
+
<a href="https://huggingface.co/ali-vilab/VACE-Wan2.1-1.3B-Preview"><img src='https://img.shields.io/badge/Model-VACE-yellow'></a>
|
31 |
+
<a href="https://modelscope.cn/collections/VACE-8fa5fcfd386e43"><img src='https://img.shields.io/badge/VACE-ModelScope-purple'></a>
|
32 |
+
<br>
|
33 |
+
</p>
|
34 |
+
|
35 |
+
|
36 |
+
## Introduction
|
37 |
+
<strong>VACE</strong> is an all-in-one model designed for video creation and editing. It encompasses various tasks, including reference-to-video generation (<strong>R2V</strong>), video-to-video editing (<strong>V2V</strong>), and masked video-to-video editing (<strong>MV2V</strong>), allowing users to compose these tasks freely. This functionality enables users to explore diverse possibilities and streamlines their workflows effectively, offering a range of capabilities, such as Move-Anything, Swap-Anything, Reference-Anything, Expand-Anything, Animate-Anything, and more.
|
38 |
+
|
39 |
+
<img src='https://raw.githubusercontent.com/ali-vilab/VACE/refs/heads/main/assets/materials/teaser.jpg'>
|
40 |
+
|
41 |
+
|
42 |
+
## 🎉 News
|
43 |
+
- [x] Mar 31, 2025: 🔥[VACE-Wan2.1-1.3B-Preview](https://huggingface.co/ali-vilab/VACE-Wan2.1-1.3B-Preview) and [VACE-LTX-Video-0.9](https://huggingface.co/ali-vilab/VACE-LTX-Video-0.9) models are now available at HuggingFace and [ModelScope](https://modelscope.cn/collections/VACE-8fa5fcfd386e43)!
|
44 |
+
- [x] Mar 31, 2025: 🔥Release code of model inference, preprocessing, and gradio demos.
|
45 |
+
- [x] Mar 11, 2025: We propose [VACE](https://ali-vilab.github.io/VACE-Page/), an all-in-one model for video creation and editing.
|
46 |
+
|
47 |
+
|
48 |
+
## 🪄 Models
|
49 |
+
| Models | Download Link | Video Size | License |
|
50 |
+
|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|-----------------------------------------------------------------------------------------------|
|
51 |
+
| VACE-Wan2.1-1.3B-Preview | [Huggingface](https://huggingface.co/ali-vilab/VACE-Wan2.1-1.3B-Preview) 🤗 [ModelScope](https://modelscope.cn/models/iic/VACE-Wan2.1-1.3B-Preview) 🤖 | ~ 81 x 480 x 832 | [Apache-2.0](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/blob/main/LICENSE.txt) |
|
52 |
+
| VACE-Wan2.1-1.3B | [To be released](https://github.com/Wan-Video) <img src='https://ali-vilab.github.io/VACE-Page/assets/logos/wan_logo.png' alt='wan_logo' style='margin-bottom: -4px; height: 15px;'> | ~ 81 x 480 x 832 | [Apache-2.0](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/blob/main/LICENSE.txt) |
|
53 |
+
| VACE-Wan2.1-14B | [To be released](https://github.com/Wan-Video) <img src='https://ali-vilab.github.io/VACE-Page/assets/logos/wan_logo.png' alt='wan_logo' style='margin-bottom: -4px; height: 15px;'> | ~ 81 x 720 x 1080 | [Apache-2.0](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/blob/main/LICENSE.txt) |
|
54 |
+
| VACE-LTX-Video-0.9 | [Huggingface](https://huggingface.co/ali-vilab/VACE-LTX-Video-0.9) 🤗 [ModelScope](https://modelscope.cn/models/iic/VACE-LTX-Video-0.9) 🤖 | ~ 97 x 512 x 768 | [RAIL-M](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.license.txt) |
|
55 |
+
|
56 |
+
- The input supports any resolution, but to achieve optimal results, the video size should fall within a specific range.
|
57 |
+
- All models inherit the license of the original model.
|
58 |
+
|
59 |
+
|
60 |
+
## ⚙️ Installation
|
61 |
+
The codebase was tested with Python 3.10.13, CUDA version 12.4, and PyTorch >= 2.5.1.
|
62 |
+
|
63 |
+
### Setup for Model Inference
|
64 |
+
You can setup for VACE model inference by running:
|
65 |
+
```bash
|
66 |
+
git clone https://github.com/ali-vilab/VACE.git && cd VACE
|
67 |
+
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124 # If PyTorch is not installed.
|
68 |
+
pip install -r requirements.txt
|
69 |
+
pip install wan@git+https://github.com/Wan-Video/Wan2.1 # If you want to use Wan2.1-based VACE.
|
70 |
+
pip install ltx-video@git+https://github.com/Lightricks/[email protected] sentencepiece --no-deps # If you want to use LTX-Video-0.9-based VACE. It may conflict with Wan.
|
71 |
+
```
|
72 |
+
Please download your preferred base model to `<repo-root>/models/`.
|
73 |
+
|
74 |
+
### Setup for Preprocess Tools
|
75 |
+
If you need preprocessing tools, please install:
|
76 |
+
```bash
|
77 |
+
pip install -r requirements/annotator.txt
|
78 |
+
```
|
79 |
+
Please download [VACE-Annotators](https://huggingface.co/ali-vilab/VACE-Annotators) to `<repo-root>/models/`.
|
80 |
+
|
81 |
+
### Local Directories Setup
|
82 |
+
It is recommended to download [VACE-Benchmark](https://huggingface.co/ali-vilab) to `<repo-root>/benchmarks/` as examples in `run_vace_xxx.sh`.
|
83 |
+
|
84 |
+
We recommend to organize local directories as:
|
85 |
+
```angular2html
|
86 |
+
VACE
|
87 |
+
├── ...
|
88 |
+
├── benchmarks
|
89 |
+
│ └── VACE-Benchmark
|
90 |
+
│ └── assets
|
91 |
+
│ └── examples
|
92 |
+
│ ├── animate_anything
|
93 |
+
│ │ └── ...
|
94 |
+
│ └── ...
|
95 |
+
├── models
|
96 |
+
│ ├── VACE-Annotators
|
97 |
+
│ │ └── ...
|
98 |
+
│ ├── VACE-LTX-Video-0.9
|
99 |
+
│ │ └── ...
|
100 |
+
│ └── VACE-Wan2.1-1.3B-Preview
|
101 |
+
│ └── ...
|
102 |
+
└── ...
|
103 |
+
```
|
104 |
+
|
105 |
+
## 🚀 Usage
|
106 |
+
In VACE, users can input **text prompt** and optional **video**, **mask**, and **image** for video generation or editing.
|
107 |
+
Detailed instructions for using VACE can be found in the [User Guide](https://github.com/ali-vilab/VACE/blob/main/UserGuide.md).
|
108 |
+
|
109 |
+
### Inference CIL
|
110 |
+
#### 1) End-to-End Running
|
111 |
+
To simply run VACE without diving into any implementation details, we suggest an end-to-end pipeline. For example:
|
112 |
+
```bash
|
113 |
+
# run V2V depth
|
114 |
+
python vace/vace_pipeline.py --base wan --task depth --video assets/videos/test.mp4 --prompt 'xxx'
|
115 |
+
|
116 |
+
# run MV2V inpainting by providing bbox
|
117 |
+
python vace/vace_pipeline.py --base wan --task inpainting --mode bbox --bbox 50,50,550,700 --video assets/videos/test.mp4 --prompt 'xxx'
|
118 |
+
```
|
119 |
+
This script will run video preprocessing and model inference sequentially,
|
120 |
+
and you need to specify all the required args of preprocessing (`--task`, `--mode`, `--bbox`, `--video`, etc.) and inference (`--prompt`, etc.).
|
121 |
+
The output video together with intermediate video, mask and images will be saved into `./results/` by default.
|
122 |
+
|
123 |
+
> 💡**Note**:
|
124 |
+
> Please refer to [run_vace_pipeline.sh](https://github.com/ali-vilab/VACE/blob/main/run_vace_pipeline.sh) for usage examples of different task pipelines.
|
125 |
+
|
126 |
+
|
127 |
+
#### 2) Preprocessing
|
128 |
+
To have more flexible control over the input, before VACE model inference, user inputs need to be preprocessed into `src_video`, `src_mask`, and `src_ref_images` first.
|
129 |
+
We assign each [preprocessor](https://github.com/ali-vilab/VACE/blob/main/vace/configs/__init__.py) a task name, so simply call [`vace_preprocess.py`](https://github.com/ali-vilab/VACE/blob/main/vace/vace_preproccess.py) and specify the task name and task params. For example:
|
130 |
+
```angular2html
|
131 |
+
# process video depth
|
132 |
+
python vace/vace_preproccess.py --task depth --video assets/videos/test.mp4
|
133 |
+
|
134 |
+
# process video inpainting by providing bbox
|
135 |
+
python vace/vace_preproccess.py --task inpainting --mode bbox --bbox 50,50,550,700 --video assets/videos/test.mp4
|
136 |
+
```
|
137 |
+
The outputs will be saved to `./proccessed/` by default.
|
138 |
+
|
139 |
+
> 💡**Note**:
|
140 |
+
> Please refer to [run_vace_pipeline.sh](https://github.com/ali-vilab/VACE/blob/main//run_vace_pipeline.sh) preprocessing methods for different tasks.
|
141 |
+
Moreover, refer to [vace/configs/](https://github.com/ali-vilab/VACE/blob/main/vace/configs/) for all the pre-defined tasks and required params.
|
142 |
+
You can also customize preprocessors by implementing at [`annotators`](https://github.com/ali-vilab/VACE/blob/main/vace/annotators/__init__.py) and register them at [`configs`](https://github.com/ali-vilab/VACE/blob/main/vace/configs).
|
143 |
+
|
144 |
+
|
145 |
+
#### 3) Model inference
|
146 |
+
Using the input data obtained from **Preprocessing**, the model inference process can be performed as follows:
|
147 |
+
```bash
|
148 |
+
# For Wan2.1 single GPU inference
|
149 |
+
python vace/vace_wan_inference.py --ckpt_dir <path-to-model> --src_video <path-to-src-video> --src_mask <path-to-src-mask> --src_ref_images <paths-to-src-ref-images> --prompt "xxx"
|
150 |
+
|
151 |
+
# For Wan2.1 Multi GPU Acceleration inference
|
152 |
+
pip install "xfuser>=0.4.1"
|
153 |
+
torchrun --nproc_per_node=8 vace/vace_wan_inference.py --dit_fsdp --t5_fsdp --ulysses_size 1 --ring_size 8 --ckpt_dir <path-to-model> --src_video <path-to-src-video> --src_mask <path-to-src-mask> --src_ref_images <paths-to-src-ref-images> --prompt "xxx"
|
154 |
+
|
155 |
+
# For LTX inference, run
|
156 |
+
python vace/vace_ltx_inference.py --ckpt_path <path-to-model> --text_encoder_path <path-to-model> --src_video <path-to-src-video> --src_mask <path-to-src-mask> --src_ref_images <paths-to-src-ref-images> --prompt "xxx"
|
157 |
+
```
|
158 |
+
The output video together with intermediate video, mask and images will be saved into `./results/` by default.
|
159 |
+
|
160 |
+
> 💡**Note**:
|
161 |
+
> (1) Please refer to [vace/vace_wan_inference.pyhttps://github.com/ali-vilab/VACE/blob/main/vace/vace_wan_inference.py) and [vace/vace_ltx_inference.py](https://github.com/ali-vilab/VACE/blob/main/vace/vace_ltx_inference.py) for the inference args.
|
162 |
+
> (2) For LTX-Video and English language Wan2.1 users, you need prompt extension to unlock the full model performance.
|
163 |
+
Please follow the [instruction of Wan2.1](https://github.com/Wan-Video/Wan2.1?tab=readme-ov-file#2-using-prompt-extension) and set `--use_prompt_extend` while running inference.
|
164 |
+
|
165 |
+
|
166 |
+
### Inference Gradio
|
167 |
+
For preprocessors, run
|
168 |
+
```bash
|
169 |
+
python vace/gradios/preprocess_demo.py
|
170 |
+
```
|
171 |
+
For model inference, run
|
172 |
+
```bash
|
173 |
+
# For Wan2.1 gradio inference
|
174 |
+
python vace/gradios/vace_wan_demo.py
|
175 |
+
|
176 |
+
# For LTX gradio inference
|
177 |
+
python vace/gradios/vace_ltx_demo.py
|
178 |
+
```
|
179 |
+
|
180 |
+
## Acknowledgement
|
181 |
+
|
182 |
+
We are grateful for the following awesome projects, including [Scepter](https://github.com/modelscope/scepter), [Wan](https://github.com/Wan-Video/Wan2.1), and [LTX-Video](https://github.com/Lightricks/LTX-Video).
|
183 |
+
|
184 |
+
|
185 |
+
## BibTeX
|
186 |
+
|
187 |
+
```bibtex
|
188 |
+
@article{vace,
|
189 |
+
title = {VACE: All-in-One Video Creation and Editing},
|
190 |
+
author = {Jiang, Zeyinzi and Han, Zhen and Mao, Chaojie and Zhang, Jingfeng and Pan, Yulin and Liu, Yu},
|
191 |
+
journal = {arXiv preprint arXiv:2503.07598},
|
192 |
+
year = {2025}
|
193 |
+
}
|