|
--- |
|
base_model: Qwen/Qwen2-VL-7B-Instruct |
|
language: |
|
- en |
|
library_name: peft |
|
license: mit |
|
tags: |
|
- LLM |
|
- VLM |
|
- Embedding |
|
- Multimodal |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
```markdown |
|
## Model Details |
|
|
|
Instruction finetuned adapter for ABC: Acheiving Better Control of Multiomodal Embeddings using VLMs. |
|
|
|
### Model Sources |
|
|
|
This model is trained on top of Qwen2VL-Instruct. |
|
|
|
### Paper and Website |
|
|
|
For more information, please refer to [Website](https://tiger-ai-lab.github.io/ABC/). |
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
``` |
|
@misc{schneider2025abcachievingbettercontrol, |
|
title={ABC: Achieving Better Control of Multimodal Embeddings using VLMs}, |
|
author={Benjamin Schneider and Florian Kerschbaum and Wenhu Chen}, |
|
year={2025}, |
|
eprint={2503.00329}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2503.00329}, |
|
} |
|
``` |
|
``` |