LLaVA-Med v1.5 (based on Mistral-7B-Instruct-v0.2)
LLaVA-Med (Large Language and Vision Assistant for bioMedicine) is an open-source large vision-language model adapted for biomedical applications. Built upon LLaVA and enhanced through curriculum learning, LLaVA-Med is fine-tuned specifically for open-ended biomedical question answering tasks.
This release aims to support research reproducibility for the corresponding paper, which demonstrates improved performance on biomedical VQA benchmarks such as PathVQA and VQA-RAD.
π Note: For original model weights, refer to microsoft/llava-med-v1.5-mistral-7b.
π¬ Experimental Usage in Libra's repo
This model checkpoint is intended for experimental use and can be tested directly within the Libra repository.
Key Modification
To enable the re-trained vision encoder during inference, ensure the following configuration is applied:
"unfreeze_mm_vision_tower": true
π Learn More
For a deeper dive into the methodology, theoretical insights, and performance benchmarks of the Libra framework, please see the following resources:
- π Project Website: Libra v1.0
- π Paper: arXiv:2411.19378
- π» Code Repository: X-iZhang/Libra (GitHub)
License
mistralai/Mistral-7B-Instruct-v0.2
- Downloads last month
- 174
Model tree for X-iZhang/libra-llava-med-v1.5-mistral-7b
Base model
microsoft/llava-med-v1.5-mistral-7b