README.md · TIGER-Lab/ABC-Qwen2VL-Instruct at main

metadata

base_model: Qwen/Qwen2-VL-7B-Instruct
language:
  - en
library_name: peft
license: mit
tags:
  - LLM
  - VLM
  - Embedding
  - Multimodal
pipeline_tag: image-text-to-text

## Model Details

Instruction finetuned adapter for ABC: Acheiving Better Control of Multiomodal Embeddings using VLMs.

### Model Sources

This model is trained on top of Qwen2VL-Instruct.

### Paper and Website

For more information, please refer to [Website](https://tiger-ai-lab.github.io/ABC/).

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

@misc{schneider2025abcachievingbettercontrol, title={ABC: Achieving Better Control of Multimodal Embeddings using VLMs}, author={Benjamin Schneider and Florian Kerschbaum and Wenhu Chen}, year={2025}, eprint={2503.00329}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.00329}, }