Audiocraft
PyTorch
video-to-audio
Zeyue7 nielsr HF Staff commited on
Commit
13546f5
·
verified ·
1 Parent(s): 0f7df35

Add library name, pipeline tag (#1)

Browse files

- Add library name, pipeline tag (1d03df18eb68b2ed7474009cf20141eaafbabc50)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +95 -93
README.md CHANGED
@@ -1,94 +1,96 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
4
-
5
- # VidMuse
6
-
7
- ## VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
8
-
9
- [TL;DR]: VidMuse is a framework for generating high-fidelity music aligned with video content, utilizing Long-Short-Term modeling, and has been accepted to CVPR 2025.
10
-
11
- ### Links
12
- - **[Paper](https://arxiv.org/pdf/2406.04321)**: Explore the research behind VidMuse.
13
- - **[Project](https://vidmuse.github.io/)**: Visit the official project page for more information and updates.
14
- - **[Dataset](https://huggingface.co/datasets/HKUSTAudio/VidMuse-Dataset)**: Download the dataset used in the paper.
15
-
16
- ## Clone the repository
17
- ```bash
18
- GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/HKUSTAudio/VidMuse
19
- cd VidMuse
20
- ```
21
-
22
- ## Usage
23
-
24
- 1. First install the [`VidMuse` library](https://github.com/ZeyueT/VidMuse)
25
- ```
26
- conda create -n VidMuse python=3.9
27
- conda activate VidMuse
28
- pip install git+https://github.com/ZeyueT/VidMuse.git
29
- ```
30
-
31
- 2. Install ffmpeg:
32
- Install ffmpeg:
33
- ```bash
34
- sudo apt-get install ffmpeg
35
- # Or if you are using Anaconda or Miniconda
36
- conda install "ffmpeg<5" -c conda-forge
37
- ```
38
-
39
-
40
- 3. Run the following Python code:
41
-
42
-
43
- ```py
44
- from video_processor import VideoProcessor, merge_video_audio
45
- from audiocraft.models import VidMuse
46
- import scipy
47
-
48
- # Path to the video
49
- video_path = 'sample.mp4'
50
- # Initialize the video processor
51
- processor = VideoProcessor()
52
- # Process the video to obtain tensors and duration
53
- local_video_tensor, global_video_tensor, duration = processor.process(video_path)
54
-
55
- progress = True
56
- USE_DIFFUSION = False
57
-
58
- # Load the pre-trained VidMuse model
59
- MODEL = VidMuse.get_pretrained('HKUSTAudio/VidMuse')
60
- # Set generation parameters for the model based on video duration
61
- MODEL.set_generation_params(duration=duration)
62
-
63
- try:
64
- # Generate outputs using the model
65
- outputs = MODEL.generate([local_video_tensor, global_video_tensor], progress=progress, return_tokens=USE_DIFFUSION)
66
- except RuntimeError as e:
67
- print(e)
68
-
69
- # Detach outputs from the computation graph and convert to CPU float tensor
70
- outputs = outputs.detach().cpu().float()
71
-
72
-
73
- sampling_rate = 32000
74
- output_wav_path = "vidmuse_sample.wav"
75
- # Write the output audio data to a WAV file
76
- scipy.io.wavfile.write(output_wav_path, rate=sampling_rate, data=outputs[0, 0].numpy())
77
-
78
- output_video_path = "vidmuse_sample.mp4"
79
- # Merge the original video with the generated music
80
- merge_video_audio(video_path, output_wav_path, output_video_path)
81
- ```
82
-
83
-
84
- ## Citation
85
- If you find our work useful, please consider citing:
86
-
87
- ```
88
- @article{tian2024vidmuse,
89
- title={Vidmuse: A simple video-to-music generation framework with long-short-term modeling},
90
- author={Tian, Zeyue and Liu, Zhaoyang and Yuan, Ruibin and Pan, Jiahao and Liu, Qifeng and Tan, Xu and Chen, Qifeng and Xue, Wei and Guo, Yike},
91
- journal={arXiv preprint arXiv:2406.04321},
92
- year={2024}
93
- }
 
 
94
  ```
 
1
+ ---
2
+ license: cc-by-4.0
3
+ library_name: audiocraft
4
+ pipeline_tag: video-to-audio
5
+ ---
6
+
7
+ # VidMuse
8
+
9
+ ## VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
10
+
11
+ [TL;DR]: VidMuse is a framework for generating high-fidelity music aligned with video content, utilizing Long-Short-Term modeling, and has been accepted to CVPR 2025.
12
+
13
+ ### Links
14
+ - **[Paper](https://arxiv.org/pdf/2406.04321)**: Explore the research behind VidMuse.
15
+ - **[Project](https://vidmuse.github.io/)**: Visit the official project page for more information and updates.
16
+ - **[Dataset](https://huggingface.co/datasets/HKUSTAudio/VidMuse-Dataset)**: Download the dataset used in the paper.
17
+
18
+ ## Clone the repository
19
+ ```bash
20
+ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/HKUSTAudio/VidMuse
21
+ cd VidMuse
22
+ ```
23
+
24
+ ## Usage
25
+
26
+ 1. First install the [`VidMuse` library](https://github.com/ZeyueT/VidMuse)
27
+ ```
28
+ conda create -n VidMuse python=3.9
29
+ conda activate VidMuse
30
+ pip install git+https://github.com/ZeyueT/VidMuse.git
31
+ ```
32
+
33
+ 2. Install ffmpeg:
34
+ Install ffmpeg:
35
+ ```bash
36
+ sudo apt-get install ffmpeg
37
+ # Or if you are using Anaconda or Miniconda
38
+ conda install "ffmpeg<5" -c conda-forge
39
+ ```
40
+
41
+
42
+ 3. Run the following Python code:
43
+
44
+
45
+ ```py
46
+ from video_processor import VideoProcessor, merge_video_audio
47
+ from audiocraft.models import VidMuse
48
+ import scipy
49
+
50
+ # Path to the video
51
+ video_path = 'sample.mp4'
52
+ # Initialize the video processor
53
+ processor = VideoProcessor()
54
+ # Process the video to obtain tensors and duration
55
+ local_video_tensor, global_video_tensor, duration = processor.process(video_path)
56
+
57
+ progress = True
58
+ USE_DIFFUSION = False
59
+
60
+ # Load the pre-trained VidMuse model
61
+ MODEL = VidMuse.get_pretrained('HKUSTAudio/VidMuse')
62
+ # Set generation parameters for the model based on video duration
63
+ MODEL.set_generation_params(duration=duration)
64
+
65
+ try:
66
+ # Generate outputs using the model
67
+ outputs = MODEL.generate([local_video_tensor, global_video_tensor], progress=progress, return_tokens=USE_DIFFUSION)
68
+ except RuntimeError as e:
69
+ print(e)
70
+
71
+ # Detach outputs from the computation graph and convert to CPU float tensor
72
+ outputs = outputs.detach().cpu().float()
73
+
74
+
75
+ sampling_rate = 32000
76
+ output_wav_path = "vidmuse_sample.wav"
77
+ # Write the output audio data to a WAV file
78
+ scipy.io.wavfile.write(output_wav_path, rate=sampling_rate, data=outputs[0, 0].numpy())
79
+
80
+ output_video_path = "vidmuse_sample.mp4"
81
+ # Merge the original video with the generated music
82
+ merge_video_audio(video_path, output_wav_path, output_video_path)
83
+ ```
84
+
85
+
86
+ ## Citation
87
+ If you find our work useful, please consider citing:
88
+
89
+ ```
90
+ @article{tian2024vidmuse,
91
+ title={Vidmuse: A simple video-to-music generation framework with long-short-term modeling},
92
+ author={Tian, Zeyue and Liu, Zhaoyang and Yuan, Ruibin and Pan, Jiahao and Liu, Qifeng and Tan, Xu and Chen, Qifeng and Xue, Wei and Guo, Yike},
93
+ journal={arXiv preprint arXiv:2406.04321},
94
+ year={2024}
95
+ }
96
  ```