Add library name, pipeline tag (#1)

Browse files

- Add library name, pipeline tag (1d03df18eb68b2ed7474009cf20141eaafbabc50)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +95 -93

README.md CHANGED Viewed

@@ -1,94 +1,96 @@
----
-license: cc-by-4.0
----
-# VidMuse
-## VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
-[TL;DR]: VidMuse is a framework for generating high-fidelity music aligned with video content, utilizing Long-Short-Term modeling, and has been accepted to CVPR 2025.
-### Links
-- **[Paper](https://arxiv.org/pdf/2406.04321)**: Explore the research behind VidMuse.
-- **[Project](https://vidmuse.github.io/)**: Visit the official project page for more information and updates.
-- **[Dataset](https://huggingface.co/datasets/HKUSTAudio/VidMuse-Dataset)**: Download the dataset used in the paper.
-## Clone the repository
-```bash
-GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/HKUSTAudio/VidMuse
-cd VidMuse
-```
-## Usage
-1. First install the [`VidMuse` library](https://github.com/ZeyueT/VidMuse)
-```
-conda create -n VidMuse python=3.9
-conda activate VidMuse
-pip install git+https://github.com/ZeyueT/VidMuse.git
-```
-2. Install ffmpeg:
-Install ffmpeg:
-```bash
-sudo apt-get install ffmpeg
-# Or if you are using Anaconda or Miniconda
-conda install "ffmpeg<5" -c conda-forge
-```
-3. Run the following Python code:
-```py
-from video_processor import VideoProcessor, merge_video_audio
-from audiocraft.models import VidMuse
-import scipy
-# Path to the video
-video_path = 'sample.mp4'
-# Initialize the video processor
-processor = VideoProcessor()
-# Process the video to obtain tensors and duration
-local_video_tensor, global_video_tensor, duration = processor.process(video_path)
-progress = True
-USE_DIFFUSION = False
-# Load the pre-trained VidMuse model
-MODEL = VidMuse.get_pretrained('HKUSTAudio/VidMuse')
-# Set generation parameters for the model based on video duration
-MODEL.set_generation_params(duration=duration)
-try:
-    # Generate outputs using the model
-    outputs = MODEL.generate([local_video_tensor, global_video_tensor], progress=progress, return_tokens=USE_DIFFUSION)
-except RuntimeError as e:
-    print(e)
-# Detach outputs from the computation graph and convert to CPU float tensor
-outputs = outputs.detach().cpu().float()
-sampling_rate = 32000
-output_wav_path = "vidmuse_sample.wav"
-# Write the output audio data to a WAV file
-scipy.io.wavfile.write(output_wav_path, rate=sampling_rate, data=outputs[0, 0].numpy())
-output_video_path = "vidmuse_sample.mp4"
-# Merge the original video with the generated music
-merge_video_audio(video_path, output_wav_path, output_video_path)
-```
-## Citation
-If you find our work useful, please consider citing:
-```
-@article{tian2024vidmuse,
-  title={Vidmuse: A simple video-to-music generation framework with long-short-term modeling},
-  author={Tian, Zeyue and Liu, Zhaoyang and Yuan, Ruibin and Pan, Jiahao and Liu, Qifeng and Tan, Xu and Chen, Qifeng and Xue, Wei and Guo, Yike},
-  journal={arXiv preprint arXiv:2406.04321},
-  year={2024}
-}
 ```

+---
+license: cc-by-4.0
+library_name: audiocraft
+pipeline_tag: video-to-audio
+---
+# VidMuse
+## VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
+[TL;DR]: VidMuse is a framework for generating high-fidelity music aligned with video content, utilizing Long-Short-Term modeling, and has been accepted to CVPR 2025.
+### Links
+- **[Paper](https://arxiv.org/pdf/2406.04321)**: Explore the research behind VidMuse.
+- **[Project](https://vidmuse.github.io/)**: Visit the official project page for more information and updates.
+- **[Dataset](https://huggingface.co/datasets/HKUSTAudio/VidMuse-Dataset)**: Download the dataset used in the paper.
+## Clone the repository
+```bash
+GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/HKUSTAudio/VidMuse
+cd VidMuse
+```
+## Usage
+1. First install the [`VidMuse` library](https://github.com/ZeyueT/VidMuse)
+```
+conda create -n VidMuse python=3.9
+conda activate VidMuse
+pip install git+https://github.com/ZeyueT/VidMuse.git
+```
+2. Install ffmpeg:
+Install ffmpeg:
+```bash
+sudo apt-get install ffmpeg
+# Or if you are using Anaconda or Miniconda
+conda install "ffmpeg<5" -c conda-forge
+```
+3. Run the following Python code:
+```py
+from video_processor import VideoProcessor, merge_video_audio
+from audiocraft.models import VidMuse
+import scipy
+# Path to the video
+video_path = 'sample.mp4'
+# Initialize the video processor
+processor = VideoProcessor()
+# Process the video to obtain tensors and duration
+local_video_tensor, global_video_tensor, duration = processor.process(video_path)
+progress = True
+USE_DIFFUSION = False
+# Load the pre-trained VidMuse model
+MODEL = VidMuse.get_pretrained('HKUSTAudio/VidMuse')
+# Set generation parameters for the model based on video duration
+MODEL.set_generation_params(duration=duration)
+try:
+    # Generate outputs using the model
+    outputs = MODEL.generate([local_video_tensor, global_video_tensor], progress=progress, return_tokens=USE_DIFFUSION)
+except RuntimeError as e:
+    print(e)
+# Detach outputs from the computation graph and convert to CPU float tensor
+outputs = outputs.detach().cpu().float()
+sampling_rate = 32000
+output_wav_path = "vidmuse_sample.wav"
+# Write the output audio data to a WAV file
+scipy.io.wavfile.write(output_wav_path, rate=sampling_rate, data=outputs[0, 0].numpy())
+output_video_path = "vidmuse_sample.mp4"
+# Merge the original video with the generated music
+merge_video_audio(video_path, output_wav_path, output_video_path)
+```
+## Citation
+If you find our work useful, please consider citing:
+```
+@article{tian2024vidmuse,
+  title={Vidmuse: A simple video-to-music generation framework with long-short-term modeling},
+  author={Tian, Zeyue and Liu, Zhaoyang and Yuan, Ruibin and Pan, Jiahao and Liu, Qifeng and Tan, Xu and Chen, Qifeng and Xue, Wei and Guo, Yike},
+  journal={arXiv preprint arXiv:2406.04321},
+  year={2024}
+}
 ```