Mirai_Kuriyama_Tiny_Ver Text-to-Video Generation

This repository contains the necessary steps and scripts to generate videos using the Mirai_Kuriyama_Tiny_Ver text-to-video model. The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create high-quality anime-style videos based on textual prompts.

(Tuned only by about 90 samples in https://huggingface.co/datasets/svjack/Mirai_Kuriyama_Videos_Captioned_Tiny_Ver)

Prerequisites

Before proceeding, ensure that you have the following installed on your system:

• Ubuntu (or a compatible Linux distribution) • Python 3.x • pip (Python package manager) • Git • Git LFS (Git Large File Storage) • FFmpeg

Installation

Update and Install Dependencies

sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg

Clone the Repository

git clone https://huggingface.co/svjack/Mirai_Kuriyama_Tiny_Ver_wan_2_1_1_3_B_text2video_lora
cd Mirai_Kuriyama_Tiny_Ver_wan_2_1_1_3_B_text2video_lora

Install Python Dependencies

pip install torch torchvision
pip install -r requirements.txt
pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
pip install moviepy==1.0.3
pip install sageattention==1.0.6

Download Model Weights

wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors

Usage

To generate a video, use the wan_generate_video.py script with the appropriate parameters. Below are examples of how to generate videos using the Mirai_Kuriyama_Tiny_Ver_wan_2_1_1_3_B_text2video_lora model.

Prefix

In the style of Beyond the Boundary ,The video begins with an intimate close-up of a character with pink hair, wearing glasses and a red jacket. Her glasses catch the sunlight as she tilts her head slightly, her expression soft and content.

Ice Cream

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Mirai_Kuriyama_Tiny_outputs/Mirai_Kuriyama_Tiny_w1_3_lora.safetensors \
--lora_multiplier 1.0 \
--prompt "In the style of Beyond the Boundary ,The video begins with an intimate close-up of a character with pink hair, wearing glasses and a red jacket. Her glasses catch the sunlight as she tilts her head slightly, her expression soft and content. In her hand, she holds a perfectly round scoop of vanilla ice cream, its surface glistening as it begins to melt under the warm afternoon sun. She takes a small, careful bite, her lips curling into a faint smile as the cool sweetness touches her tongue. A tiny dollop of ice cream clings to the corner of her mouth, and she absently wipes it away with the back of her hand, her movements unhurried and deliberate."

Burger

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Mirai_Kuriyama_Tiny_outputs/Mirai_Kuriyama_Tiny_w1_3_lora.safetensors \
--lora_multiplier 1.0 \
--prompt "In the style of Beyond the Boundary ,The video begins with an intimate close-up of a character with pink hair, wearing glasses and a red jacket. Her glasses catch the sunlight as she tilts her head slightly, her expression soft and content. In her hand, she holds a perfectly stacked burger, its golden bun glistening with a hint of sesame seeds under the warm afternoon sun. She takes a small, careful bite, her lips curling into a faint smile as the savory flavors burst onto her tongue. A tiny smear of sauce clings to the corner of her mouth, and she absently wipes it away with the back of her hand, her movements unhurried and deliberate. The crunch of fresh lettuce and the juiciness of the patty seem to transport her for a moment, her eyes briefly closing as she savors the simple pleasure."

School

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Mirai_Kuriyama_Tiny_outputs/Mirai_Kuriyama_Tiny_w1_3_lora-000010.safetensors \
--lora_multiplier 1.0 \
--prompt "In the style of Beyond the Boundary ,The video begins with an intimate close-up of a character with pink hair, wearing glasses and a red jacket. Her glasses catch the sunlight as she tilts her head slightly, her expression soft and content. She strolls leisurely across the campus, her footsteps soft against the sunlit pavement. The warm afternoon breeze gently tousles her hair, carrying with it the faint scent of blooming flowers from the nearby gardens. Her hands are tucked casually into the pockets of her hoodie, her expression calm and contemplative as she takes in the familiar surroundings. Students pass by in clusters, their laughter and chatter blending into the background like a distant melody. Occasionally, she pauses to glance at the trees swaying in the wind, their leaves casting dappled shadows on the ground. A faint smile plays on her lips, as if she’s savoring the quiet moment amidst the bustling campus life. Her pace is unhurried, deliberate, as though she’s savoring every step, every breath, in this fleeting pocket of peace."

Parameters

--fp8: Enable FP8 precision (optional).
--task: Specify the task (e.g., t2v-1.3B).
--video_size: Set the resolution of the generated video (e.g., 1024 1024).
--video_length: Define the length of the video in frames.
--infer_steps: Number of inference steps.
--save_path: Directory to save the generated video.
--output_type: Output type (e.g., both for video and frames).
--dit: Path to the diffusion model weights.
--vae: Path to the VAE model weights.
--t5: Path to the T5 model weights.
--attn_mode: Attention mode (e.g., torch).
--lora_weight: Path to the LoRA weights.
--lora_multiplier: Multiplier for LoRA weights.
--prompt: Textual prompt for video generation.

Output

The generated video and frames will be saved in the specified save_path directory.

Troubleshooting

• Ensure all dependencies are correctly installed. • Verify that the model weights are downloaded and placed in the correct locations. • Check for any missing Python packages and install them using pip.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

• Hugging Face for hosting the model weights. • Wan-AI for providing the pre-trained models. • DeepBeepMeep for contributing to the model weights.

Contact

For any questions or issues, please open an issue on the repository or contact the maintainer.