Image-Text-to-Text
Transformers
Safetensors
English
idefics3
multimodal
vision
conversational
andito HF Staff commited on
Commit
a1b83a3
·
verified ·
1 Parent(s): 30d4b60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -195,17 +195,16 @@ The model is built on top of two pre-trained models: [google/siglip-so400m-patch
195
  **BibTeX:**
196
 
197
  ```bibtex
198
- @misc{laurençon2024matters,
199
- title={What matters when building vision-language models?},
200
- author={Hugo Laurençon and Léo Tronchon and Matthieu Cord and Victor Sanh},
201
  year={2024},
202
- eprint={2405.02246},
203
  archivePrefix={arXiv},
204
  primaryClass={cs.CV}
205
  }
206
  ```
207
 
208
- TODO: new paper
209
 
210
  # Acknowledgements
211
 
 
195
  **BibTeX:**
196
 
197
  ```bibtex
198
+ @misc{laurençon2024building,
199
+ title={Building and better understanding vision-language models: insights and future directions.},
200
+ author={Hugo Laurençon and Andrés Marafioti and Victor Sanh and Léo Tronchon},
201
  year={2024},
202
+ eprint={2408.12637},
203
  archivePrefix={arXiv},
204
  primaryClass={cs.CV}
205
  }
206
  ```
207
 
 
208
 
209
  # Acknowledgements
210