--- library_name: transformers tags: - page - classification base_model: - google/vit-base-patch16-224 pipeline_tag: image-classification license: mit --- # Image classification using fine-tuned ViT - for historical :bowtie: documents sorting ### Goal: solve a task of archive page images sorting (for their further content-based processing) **Scope:** Processing of images, training and evaluation of ViT model, input file/directory processing, class 🏷️ (category) results of top N predictions output, predictions summarizing into a tabular format, HF 😊 hub support for the model ## Model description πŸ“‡ πŸ”² Fine-tuned model repository: vit-historical-page [^1] πŸ”— πŸ”³ Base model repository: google's vit-base-patch16-224 [^2] πŸ”— ### Data πŸ“œ Training set of the model: **8950** images ### Categories 🏷️ | Label️ | Ratio | Description | |------------:|:-------:|:-----------------------------------------------------------------------------| | **DRAW** | 11.89% | **πŸ“ˆ - drawings, maps, paintings with text** | | **DRAW_L** | 8.17% | **πŸ“ˆπŸ“ - drawings ... with a table legend or inside tabular layout / forms** | | **LINE_HW** | 5.99% | **βœοΈπŸ“ - handwritten text lines inside tabular layout / forms** | | **LINE_P** | 6.06% | **πŸ“ - printed text lines inside tabular layout / forms** | | **LINE_T** | 13.39% | **πŸ“ - machine typed text lines inside tabular layout / forms** | | **PHOTO** | 10.21% | **πŸŒ„ - photos with text** | | **PHOTO_L** | 7.86% | **πŸŒ„πŸ“ - photos inside tabular layout / forms or with a tabular annotation** | | **TEXT** | 8.58% | **πŸ“° - mixed types of printed and handwritten texts** | | **TEXT_HW** | 7.36% | **βœοΈπŸ“„ - only handwritten text** | | **TEXT_P** | 6.95% | **πŸ“„ - only printed text** | | **TEXT_T** | 13.53% | **πŸ“„ - only machine typed text** | Evaluation set (same proportions): **995** images #### Data preprocessing During training the following transforms were applied randomly with a 50% chance: * transforms.ColorJitter(brightness 0.5) * transforms.ColorJitter(contrast 0.5) * transforms.ColorJitter(saturation 0.5) * transforms.ColorJitter(hue 0.5) * transforms.Lambda(lambda img: ImageEnhance.Sharpness(img).enhance(random.uniform(0.5, 1.5))) * transforms.Lambda(lambda img: img.filter(ImageFilter.GaussianBlur(radius=random.uniform(0, 2)))) ### Training Hyperparameters * eval_strategy "epoch" * save_strategy "epoch" * learning_rate 5e-5 * per_device_train_batch_size 8 * per_device_eval_batch_size 8 * num_train_epochs 3 * warmup_ratio 0.1 * logging_steps 10 * load_best_model_at_end True * metric_for_best_model "accuracy" ### Results πŸ“Š Evaluation set's accuracy (**Top-3**): **99.6%** ![TOP-3 confusion matrix - trained ViT](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/plots/20250209-1526_conf_mat.png?raw=true) Evaluation set's accuracy (**Top-1**): **97.3%** ![TOP-1 confusion matrix - trained ViT](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/plots/20250218-1523_conf_mat.png?raw=true) #### Result tables - Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/tables/20250209-1534_model_1119_3_TOP-3_EVAL.csv) πŸ”— - Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/tables/20250218-1519_model_1119_3_TOP-1_EVAL.csv) πŸ”— #### Table columns - **FILE** - name of the file - **PAGE** - number of the page - **CLASS-N** - label of the category 🏷️, guess TOP-N - **SCORE-N** - score of the category 🏷️, guess TOP-N - **TRUE** - actual label of the category 🏷️ ### Contacts πŸ“§ For support write to πŸ“§ lutsai.k@gmail.com πŸ“§ Official repository: UFAL [^3] ### Acknowledgements πŸ™ - **Developed by** UFAL [^5] πŸ‘₯ - **Funded by** ATRIUM [^4] πŸ’° - **Shared by** ATRIUM [^4] & UFAL [^5] - **Model type:** fine-tuned ViT [^2] with a 224x224 resolution size **©️ 2022 UFAL & ATRIUM** [^1]: https://huggingface.co/ufal/vit-historical-page [^2]: https://huggingface.co/google/vit-base-patch16-224 [^3]: https://github.com/ufal/atrium-page-classification [^4]: https://atrium-research.eu/ [^5]: https://ufal.mff.cuni.cz/home-page