--- license: other license_name: qwen license_link: LICENSE datasets: - linxy/LaTeX_OCR - OleehyO/latex-formulas metrics: - cer base_model: - Qwen/Qwen2.5-VL-3B-Instruct --- # Model Card for Model ID ## summary This is a finetuned version of [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), focusing on the task img2latex. The model is finetuned on the dataset [OleehyO/latex-formulas](https://huggingface.com/datasets/OleehyO/latex-formulas) with 2 epochs to enhance latex ocr capability, and one epoch on [linxy/LaTeX-OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR) to regulate the model's output. This work is inspired by [prithivMLmods/Qwen2-VL-OCR-2B-Instruct](https://huggingface.co/prithivMLmods/Qwen2-VL-OCR-2B-Instruct). ## evaluation | model | metric | value | |-----------------------------------------------|-------------------|-------| | prithivMLmods/Qwen2-VL-OCR-2B-Instruct (bf16) | rouge-l: f1-score | 0.88 | | | CER | 0.24 | | etherealgemini/Qwen2_5-VL-OCR-3B-Instruct (bf16) | rouge-l: f1-score | 0.91 | | | CER | 0.21 | | | | | The improvement probably comes from: 1. model's upgrade, for sure...? 2. larger dataset: 100K -> 550K There is an even MUCH larger dataset [OleehyO/latex-formulas-80M](https://huggingface.co/datasets/OleehyO/latex-formulas-80M), but my computing resources are limited.