YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Pixel Image-to-Video Generation

This repository contains the necessary steps and scripts to generate videos using the Pixel image-to-video model. The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create high-quality anime-style videos based on textual prompts.

Prerequisites

Before proceeding, ensure that you have the following installed on your system:

• Ubuntu (or a compatible Linux distribution) • Python 3.x • pip (Python package manager) • Git • Git LFS (Git Large File Storage) • FFmpeg

Installation

  1. Update and Install Dependencies

    sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg
    
  2. Clone the Repository

    git clone https://huggingface.co/svjack/Pixel_wan_2_1_14_B_image2video_lora
    cd Pixel_wan_2_1_14_B_image2video_lora
    
  3. Install Python Dependencies

    pip install torch torchvision
    pip install -r requirements.txt
    pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
    pip install moviepy==1.0.3
    pip install sageattention==1.0.6
    
  4. Download Model Weights

    wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
    wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
    wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
    wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors
    wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors
    wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors
    wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_i2v_480p_14B_bf16.safetensors
    

Usage

To generate a video, use the wan_generate_video.py script with the appropriate parameters. Below are examples of how to generate videos using the Pixel model.

1. "Colorful Girl with Orange Hair and Blue Eyes"

  • Source Image

image/png

python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight pixel_outputs/pixel_w14_lora-000008.safetensors \
--lora_multiplier 1.5 \
--image_path "pixel_im1.png" \
--prompt "The video showcases a young girl with orange hair and blue eyes, sitting on the ground. She's wearing a colorful dress with a brown skirt and a yellow top, along with red shoes. The girl is holding a red cup with a straw and has a green hat with a red band. The background features a pink sky with hearts and a yellow plant."


2. "Dynamic Pixel Art Scene with Genshin Impact Cartoon Characters"

  • Source Image

image/jpeg

python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight pixel_outputs/pixel_w14_lora-000008.safetensors \
--lora_multiplier 1.5 \
--image_path "pixel_im4.jpg" \
--prompt "The video depicts a scene rich in pixel art style, depicting two cartoon characters interacting against a colorful background. In the foreground, a little dragon appears serious and focused, dressed in brown and yellow attire, creating a cute and lively impression. In the background, a white-haired character dressed in blue and white clothing is shown in a dynamic pose, seemingly in motion, adding energy to the scene. The entire setting is vibrant, with elements resembling buildings in the background, evoking a retro yet whimsical atmosphere. The image is filled with intricate details and playfulness, showcasing the unique charm of pixel art."


3. "Whimsical Nighttime Scene with Genshin Impact Animated Characters"

  • Source Image

image/jpeg

python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight pixel_outputs/pixel_w14_lora-000008.safetensors \
--lora_multiplier 1.5 \
--image_path "pixel_im2.jpg" \
--prompt "The video depicts a charming nighttime scene with three animated characters in a whimsical setting. The main elements include a wooden house with a porch, where two characters are sitting. The character on the left is dressed in blue attire, while the character on the right is adorned in green. The background features a starry night sky with a shooting star, adding a magical touch to the scene. The surrounding environment includes lush greenery and a distant view of other houses, creating a serene and enchanting atmosphere. The overall composition is vibrant and colorful, with a focus on the characters and their interaction with the natural setting."

Parameters

  • --fp8: Enable FP8 precision (optional).
  • --task: Specify the task (e.g., t2v-1.3B).
  • --video_size: Set the resolution of the generated video (e.g., 1024 1024).
  • --video_length: Define the length of the video in frames.
  • --infer_steps: Number of inference steps.
  • --save_path: Directory to save the generated video.
  • --output_type: Output type (e.g., both for video and frames).
  • --dit: Path to the diffusion model weights.
  • --vae: Path to the VAE model weights.
  • --t5: Path to the T5 model weights.
  • --attn_mode: Attention mode (e.g., torch).
  • --lora_weight: Path to the LoRA weights.
  • --lora_multiplier: Multiplier for LoRA weights.
  • --prompt: Textual prompt for video generation.

Output

The generated video and frames will be saved in the specified save_path directory.

Troubleshooting

• Ensure all dependencies are correctly installed. • Verify that the model weights are downloaded and placed in the correct locations. • Check for any missing Python packages and install them using pip.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

• Hugging Face for hosting the model weights. • Wan-AI for providing the pre-trained models. • DeepBeepMeep for contributing to the model weights.

Contact

For any questions or issues, please open an issue on the repository or contact the maintainer.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.