Pixel Image-to-Video Generation
This repository contains the necessary steps and scripts to generate videos using the Pixel image-to-video model. The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create high-quality anime-style videos based on textual prompts.
Prerequisites
Before proceeding, ensure that you have the following installed on your system:
• Ubuntu (or a compatible Linux distribution) • Python 3.x • pip (Python package manager) • Git • Git LFS (Git Large File Storage) • FFmpeg
Installation
Update and Install Dependencies
sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg
Clone the Repository
git clone https://huggingface.co/svjack/Pixel_wan_2_1_14_B_image2video_lora cd Pixel_wan_2_1_14_B_image2video_lora
Install Python Dependencies
pip install torch torchvision pip install -r requirements.txt pip install ascii-magic matplotlib tensorboard huggingface_hub datasets pip install moviepy==1.0.3 pip install sageattention==1.0.6
Download Model Weights
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_i2v_480p_14B_bf16.safetensors
Usage
To generate a video, use the wan_generate_video.py
script with the appropriate parameters. Below are examples of how to generate videos using the Pixel model.
1. "Colorful Girl with Orange Hair and Blue Eyes"
- Source Image
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight pixel_outputs/pixel_w14_lora-000008.safetensors \
--lora_multiplier 1.5 \
--image_path "pixel_im1.png" \
--prompt "The video showcases a young girl with orange hair and blue eyes, sitting on the ground. She's wearing a colorful dress with a brown skirt and a yellow top, along with red shoes. The girl is holding a red cup with a straw and has a green hat with a red band. The background features a pink sky with hearts and a yellow plant."
2. "Dynamic Pixel Art Scene with Genshin Impact Cartoon Characters"
- Source Image
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight pixel_outputs/pixel_w14_lora-000008.safetensors \
--lora_multiplier 1.5 \
--image_path "pixel_im4.jpg" \
--prompt "The video depicts a scene rich in pixel art style, depicting two cartoon characters interacting against a colorful background. In the foreground, a little dragon appears serious and focused, dressed in brown and yellow attire, creating a cute and lively impression. In the background, a white-haired character dressed in blue and white clothing is shown in a dynamic pose, seemingly in motion, adding energy to the scene. The entire setting is vibrant, with elements resembling buildings in the background, evoking a retro yet whimsical atmosphere. The image is filled with intricate details and playfulness, showcasing the unique charm of pixel art."
3. "Whimsical Nighttime Scene with Genshin Impact Animated Characters"
- Source Image
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight pixel_outputs/pixel_w14_lora-000008.safetensors \
--lora_multiplier 1.5 \
--image_path "pixel_im2.jpg" \
--prompt "The video depicts a charming nighttime scene with three animated characters in a whimsical setting. The main elements include a wooden house with a porch, where two characters are sitting. The character on the left is dressed in blue attire, while the character on the right is adorned in green. The background features a starry night sky with a shooting star, adding a magical touch to the scene. The surrounding environment includes lush greenery and a distant view of other houses, creating a serene and enchanting atmosphere. The overall composition is vibrant and colorful, with a focus on the characters and their interaction with the natural setting."
Parameters
--fp8
: Enable FP8 precision (optional).--task
: Specify the task (e.g.,t2v-1.3B
).--video_size
: Set the resolution of the generated video (e.g.,1024 1024
).--video_length
: Define the length of the video in frames.--infer_steps
: Number of inference steps.--save_path
: Directory to save the generated video.--output_type
: Output type (e.g.,both
for video and frames).--dit
: Path to the diffusion model weights.--vae
: Path to the VAE model weights.--t5
: Path to the T5 model weights.--attn_mode
: Attention mode (e.g.,torch
).--lora_weight
: Path to the LoRA weights.--lora_multiplier
: Multiplier for LoRA weights.--prompt
: Textual prompt for video generation.
Output
The generated video and frames will be saved in the specified save_path
directory.
Troubleshooting
• Ensure all dependencies are correctly installed.
• Verify that the model weights are downloaded and placed in the correct locations.
• Check for any missing Python packages and install them using pip
.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
• Hugging Face for hosting the model weights. • Wan-AI for providing the pre-trained models. • DeepBeepMeep for contributing to the model weights.
Contact
For any questions or issues, please open an issue on the repository or contact the maintainer.