Tomoyo_Sakagami Text-to-Video Generation

This repository contains the necessary steps and scripts to generate videos using the Tomoyo_Sakagami text-to-video model. The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create high-quality anime-style videos based on textual prompts.

Prerequisites

Before proceeding, ensure that you have the following installed on your system:

• Ubuntu (or a compatible Linux distribution) • Python 3.x • pip (Python package manager) • Git • Git LFS (Git Large File Storage) • FFmpeg

Installation

Update and Install Dependencies

sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg

Clone the Repository

git clone https://huggingface.co/svjack/Tomoyo_Sakagami_wan_2_1_1_3_B_text2video_lora
cd Tomoyo_Sakagami_wan_2_1_1_3_B_text2video_lora

Install Python Dependencies

pip install torch torchvision
pip install -r requirements.txt
pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
pip install moviepy==1.0.3
pip install sageattention==1.0.6

Download Model Weights

wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors

Usage

To generate a video, use the wan_generate_video.py script with the appropriate parameters. Below are examples of how to generate videos using the Tomoyo_Sakagami model.

Tomoyo_Sakagami in a Park: A CLANNAD-Inspired Scene

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Tomoyo_Sakagami_outputs/Tomoyo_Sakagami_w1_3_lora-000012.safetensors \
--lora_multiplier 1.0 \
--prompt "In the style of CLANNAD SEASON 1 , the video features a young female character with long, straight hair and a serious expression. She is dressed in a school uniform consisting of a white shirt with a red tie and a blue skirt, complemented by a black ribbon in her hair. The character is standing on a paved pathway that runs through a park-like setting, surrounded by lush green trees and bushes. The pathway is bordered by metal railings on both sides. The lighting suggests it is daytime with clear skies."

Tomoyo_Sakagami Athletic Joy: A Burger Moment in CLANNAD Style

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Tomoyo_Sakagami_outputs/Tomoyo_Sakagami_w1_3_lora-000012.safetensors \
--lora_multiplier 1.0 \
--prompt "In the style of CLANNAD SEASON 1 , the video features a young female character with long, straight hair and a serious expression. She is wearing a gymnastics outfit, consisting of a simple white short-sleeved athletic shirt paired with dark blue athletic shorts and lightweight sneakers. Her hair is neatly tied back with a black headband, giving her a clean and sporty joyful look. She is holding a freshly made burger in her hands, its golden bun slightly glistening under the warm glow of the restaurant lights. With a look of satisfaction, she takes a hearty bite, She leans back in her seat, savoring every bite"

Cozy Slumber: A Tranquil Night of Tomoyo_Sakagami in Japanese Style

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Tomoyo_Sakagami_outputs/Tomoyo_Sakagami_w1_3_lora-000012.safetensors \
--lora_multiplier 1.0 \
--prompt "In the style of CLANNAD SEASON 1 , the video features a young female character with long, straight hair and a serious expression. She is wearing a cute set of pajamas, consisting of a soft, pastel-colored top adorned with tiny cartoon characters and matching bottoms with a whimsical pattern. Her hair is loosely tied back with a black headband, giving her a relaxed and cozy look. Now, she lies peacefully in a traditional Japanese-style room, the soft glow of moonlight filtering through the paper shoji screens. A futon is spread out on the tatami mat floor, and she rests on it, her breathing slow and steady. The room is quiet, save for the faint rustle of leaves outside and the occasional chime of a wind bell. Her expression is calm, her eyes gently open as she prepares to drift off to sleep. The soft glow of moonlight filters through the paper shoji screens, casting a tranquil ambiance in the traditional Japanese-style room. She lies on a futon spread across the tatami mat floor, her head resting on a plush pillow. The room is quiet, with only the faint rustle of leaves outside and the occasional chime of a wind bell breaking the silence. Her gaze is soft, fixed on the ceiling for a moment, before she slowly closes her eyes, letting out a quiet sigh as she settles into a peaceful state, ready to embrace sleep."

Parameters

--fp8: Enable FP8 precision (optional).
--task: Specify the task (e.g., t2v-1.3B).
--video_size: Set the resolution of the generated video (e.g., 1024 1024).
--video_length: Define the length of the video in frames.
--infer_steps: Number of inference steps.
--save_path: Directory to save the generated video.
--output_type: Output type (e.g., both for video and frames).
--dit: Path to the diffusion model weights.
--vae: Path to the VAE model weights.
--t5: Path to the T5 model weights.
--attn_mode: Attention mode (e.g., torch).
--lora_weight: Path to the LoRA weights.
--lora_multiplier: Multiplier for LoRA weights.
--prompt: Textual prompt for video generation.

Output

The generated video and frames will be saved in the specified save_path directory.

Troubleshooting

• Ensure all dependencies are correctly installed. • Verify that the model weights are downloaded and placed in the correct locations. • Check for any missing Python packages and install them using pip.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

• Hugging Face for hosting the model weights. • Wan-AI for providing the pre-trained models. • DeepBeepMeep for contributing to the model weights.

Contact

For any questions or issues, please open an issue on the repository or contact the maintainer.