metadata
license: other
license_name: qwen
license_link: LICENSE
datasets:
- linxy/LaTeX_OCR
- OleehyO/latex-formulas
metrics:
- cer
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
Model Card for Model ID
summary
This is a finetuned version of Qwen2.5-VL-3B-Instruct, focusing on the task img2latex.
The model is finetuned on the dataset OleehyO/latex-formulas with 2 epochs to enhance latex ocr capability, and one epoch on linxy/LaTeX-OCR to regulate the model's output.
This work is inspired by prithivMLmods/Qwen2-VL-OCR-2B-Instruct.
evaluation
model | metric | value |
---|---|---|
prithivMLmods/Qwen2-VL-OCR-2B-Instruct (bf16) | rouge-l: f1-score | 0.88 |
CER | 0.24 | |
etherealgemini/Qwen2_5-VL-OCR-3B-Instruct (bf16) | rouge-l: f1-score | 0.91 |
CER | 0.21 | |
The improvement probably comes from:
- model's upgrade, for sure...?
- larger dataset: 100K -> 550K
There is an even MUCH larger dataset OleehyO/latex-formulas-80M, but my computing resources are limited.