---
license: other
license_name: qwen
license_link: LICENSE
datasets:
- linxy/LaTeX_OCR
- OleehyO/latex-formulas
metrics:
- cer
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
---
# Model Card for Model ID

## summary
<!-- Provide a quick summary of what the model is/does. -->

This is a finetuned version of [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), focusing on the task img2latex.

The model is finetuned on the dataset [OleehyO/latex-formulas](https://huggingface.com/datasets/OleehyO/latex-formulas) with 2 epochs to enhance latex ocr capability, 
and one epoch on [linxy/LaTeX-OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR) to regulate the model's output.

This work is inspired by [prithivMLmods/Qwen2-VL-OCR-2B-Instruct](https://huggingface.co/prithivMLmods/Qwen2-VL-OCR-2B-Instruct).

## evaluation


| model                                         | metric            | value |
|-----------------------------------------------|-------------------|-------|
| prithivMLmods/Qwen2-VL-OCR-2B-Instruct (bf16) | rouge-l: f1-score | 0.88  |
|                                               | CER               | 0.24  |
| etherealgemini/Qwen2_5-VL-OCR-3B-Instruct (bf16) | rouge-l: f1-score | 0.91  |
|                                               | CER               | 0.21  |
|                                               |                   |       |

The improvement probably comes from:

1. model's upgrade, for sure...?
2. larger dataset: 100K -> 550K

There is an even MUCH larger dataset [OleehyO/latex-formulas-80M](https://huggingface.co/datasets/OleehyO/latex-formulas-80M), but my computing resources are limited.