Safetensors
llava_next

Pangea-7B Model Card

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

🇪🇹 🇸🇦 🇧🇬 🇧🇩 🇨🇿 🇩🇪 🇬🇷 🇬🇧 🇺🇸 🇪🇸 🇮🇷 🇫🇷 🇮🇪 🇮🇳 🇮🇩 🇳🇬 🇮🇹 🇮🇱 🇯🇵 🇮🇩 🇰🇷 🇳🇱 🇲🇳 🇲🇾 🇳🇴 🇵🇱 🇵🇹 🇧🇷 🇷🇴 🇷🇺 🇱🇰 🇮🇩 🇰🇪 🇹🇿 🇱🇰 🇹🇭 🇹🇷 🇺🇦 🇵🇰 🇻🇳 🇨🇳 🇹🇼

🏠 Homepage | 🤖 Pangea-7B | 📊 PangeaIns | 🧪 PangeaBench | 💻 Github | 📄 Arxiv | 📕 PDF | 🖥️ Demo

description

Model details

  • Model: Pangea is a fully open-source Multilingual Multimodal Multicultural LLM.
  • Date: Pangea-7B was trained in 2024.
  • Training Dataset: 6M PangeaIns.
  • Architecture: Pangea-7B follows the architecture of LLaVA-NeXT, with a Qwen2-7B-Instruct backbone.

Uses

The hf version is intended so that you could use Pangea-7B with the huggingface generate function. If you want to use it with the Llava-Next codebase, please refer to our original checkpoint.

# Assuming that you have text_input and image_path
from transformers import LlavaNextForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

image_input = Image.open(image_path)

model = LlavaNextForConditionalGeneration.from_pretrained(
            "neulab/Pangea-7B-hf", 
            torch_dtype=torch.float16
        ).to(0)
processor = AutoProcessor.from_pretrained("neulab/Pangea-7B-hf")
model.resize_token_embeddings(len(processor.tokenizer))

text_input = f"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\n{text_input}<|im_end|>\n<|im_start|>assistant\n"
model_inputs = processor(images=image_input, text=text_input, return_tensors='pt').to("cuda", torch.float16)
output = model.generate(**model_inputs, max_new_tokens=1024, min_new_tokens=32, temperature=1.0, top_p=0.9, do_sample=True)
output = output[0]
result = processor.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=False)

print(result)

Citing the Model

BibTeX Citation:

@article{yue2024pangeafullyopenmultilingual,
  title={Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages},
  author={Xiang Yue and Yueqi Song and Akari Asai and Seungone Kim and Jean de Dieu Nyandwi and Simran Khanuja and Anjali Kantharuban and Lintang Sutawika and Sathyanarayanan Ramamoorthy and Graham Neubig},
  year={2024},
  journal={arXiv preprint arXiv:2410.16153},
  url={https://arxiv.org/abs/2410.16153}
}
Downloads last month
869
Safetensors
Model size
7.94B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for neulab/Pangea-7B-hf

Base model

Qwen/Qwen2-7B
Finetuned
(78)
this model

Dataset used to train neulab/Pangea-7B-hf

Collection including neulab/Pangea-7B-hf