weizhiwang
/

llava-v1.5-llama-3-8b-pretrain-clip-large

Text Generation

Model card Files Files and versions Community

llava-v1.5-llama-3-8b-pretrain Model Card

This is a pretrained checkpoint with the MLP connector after LLaVA stage 1, you can use it to instruct tune your multimodal models. Please follow my reproduced implementation LLaVA-Llama-3 for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM.

Training dataset

558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.

Architecture

LLM: llama-3-8b (Frozen)
Vision-Language Adapter: MLP
Vision Encoder: CLIP-ViT-L (Frozen)

Downloads last month: 34

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train weizhiwang/llava-v1.5-llama-3-8b-pretrain-clip-large