metadata

license: other
license_name: qwen
license_link: LICENSE
datasets:
  - linxy/LaTeX_OCR
  - OleehyO/latex-formulas
metrics:
  - cer
base_model:
  - Qwen/Qwen2.5-VL-3B-Instruct

Model Card for Model ID

summary

This is a finetuned version of Qwen2.5-VL-3B-Instruct, focusing on the task img2latex.

The model is finetuned on the dataset OleehyO/latex-formulas with 2 epochs to enhance latex ocr capability, and one epoch on linxy/LaTeX-OCR to regulate the model's output.

This work is inspired by prithivMLmods/Qwen2-VL-OCR-2B-Instruct.

evaluation

model	metric	value
prithivMLmods/Qwen2-VL-OCR-2B-Instruct (bf16)	rouge-l: f1-score	0.88
	CER	0.24
etherealgemini/Qwen2_5-VL-OCR-3B-Instruct (bf16)	rouge-l: f1-score	0.91
	CER	0.21

The improvement probably comes from:

model's upgrade, for sure...?
larger dataset: 100K -> 550K

There is an even MUCH larger dataset OleehyO/latex-formulas-80M, but my computing resources are limited.