metadata
license: apache-2.0
datasets:
- naver-clova-ix/synthdog-ko
language:
- ko
- en
base_model:
- Qwen/Qwen2-VL-7B-Instruct
tags:
- OCR
- Korea
- Korean
developer0hye/synthdog-koQwen2-VL-7B-Instruct
- Training Code - developer0hye/synthdog-koQwen2-VL-7B-Instruct
- naver-clova-ix/synthdog-ko dataset was used to teach the model the order in which Korean sentences should be read and how to recognize Korean characters.
- Finetune Qwen2-VL-7B-Instruct model from this weights for Korean OCR with real image datasets such as developer0hye/korocr
Hmm... Honestly, I'm not sure if this model has the potential to be a good Korean OCR model. But let's try fine-tuning it anyway. If you get good results, email me at [email protected] 😄
Quickstart