LLaVA-Med v1.5 (based on Mistral-7B-Instruct-v0.2)

LLaVA-Med (Large Language and Vision Assistant for bioMedicine) is an open-source large vision-language model adapted for biomedical applications. Built upon LLaVA and enhanced through curriculum learning, LLaVA-Med is fine-tuned specifically for open-ended biomedical question answering tasks.

This release aims to support research reproducibility for the corresponding paper, which demonstrates improved performance on biomedical VQA benchmarks such as PathVQA and VQA-RAD.

📌 Note: For original model weights, refer to microsoft/llava-med-v1.5-mistral-7b.

🔬 Experimental Usage in Libra's repo

This model checkpoint is intended for experimental use and can be tested directly within the Libra repository.

Key Modification

To enable the re-trained vision encoder during inference, ensure the following configuration is applied:

"unfreeze_mm_vision_tower": true

📚 Learn More

For a deeper dive into the methodology, theoretical insights, and performance benchmarks of the Libra framework, please see the following resources:

🔗 Project Website: Libra v1.0
📄 Paper: arXiv:2411.19378
💻 Code Repository: X-iZhang/Libra (GitHub)

X-iZhang
/

libra-llava-med-v1.5-mistral-7b

LLaVA-Med v1.5 (based on Mistral-7B-Instruct-v0.2)

🔬 Experimental Usage in Libra's repo

Key Modification

📚 Learn More

License

mistralai/Mistral-7B-Instruct-v0.2

Model tree for X-iZhang/libra-llava-med-v1.5-mistral-7b