ufal
/

vit-historical-page

@@ -5,6 +5,8 @@ tags:
 - classification
 base_model:
 - google/vit-base-patch16-224
 pipeline_tag: image-classification
 license: mit
 ---
@@ -21,28 +23,33 @@ HF 😊 hub support for the model
 ## Versions 🏁
 There are currently 2 version of the model available for download, both of them have the same set of categories,
-but different data annotations. The latest `v2.0` is considered to be default.
-| Version | Pages | N-page files |   PDFs   | Description                                                   |
-|--------:|:-----:|:------------:|:--------:|:--------------------------------------------------------------|
-|  `v1.0` | 10073 |   **~104**   | **3896** | annotations with mistakes, more heterogenous data             |
-|  `v1.0` | 11940 |   **~509**   | **5002** | more diverse pages in each category, less annotation mistakes |
 ## Model description 📇
 🔲 Fine-tuned model repository:  vit-historical-page [^1] 🔗
-🔳 Base model repository: google's vit-base-patch16-224 [^2] 🔗
 ### Data 📜
-Training set of the model: **8950** images for v1.0
-Training set of the model: **10745** images for v2.0
 ### Categories 🏷️
-**v1.0 version Categories 🪧**:
 |    Label️ | Ratio  | Description                                                                   |
 |----------:|:------:|:------------------------------------------------------------------------------|
@@ -58,7 +65,7 @@ Training set of the model: **10745** images for v2.0
 |  `TEXT_P` | 6.95%  | **📄 - only printed text**                                                    |
 |  `TEXT_T` | 13.53% | **📄 - only machine typed text**                                              |
-**v2.0 version Categories 🪧**:
 |    Label️ | Ratio | Description                                                                   |
 |----------:|:-----:|:------------------------------------------------------------------------------|
@@ -74,9 +81,9 @@ Training set of the model: **10745** images for v2.0
 |  `TEXT_P` | 9.07% | **📄 - only printed text**                                                    |
 |  `TEXT_T` | 9.05% | **📄 - only machine typed text**                                              |
-Evaluation set (same proportions):	**995** images for v1.0
-Evaluation set (same proportions):	**1194** images for v2.0
 #### Data preprocessing
@@ -105,31 +112,31 @@ During training the following transforms were applied randomly with a 50% chance
 ### Results 📊
-**v1.0** Evaluation set's accuracy (**Top-3**):  **99.6%**
 ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1430_conf_mat_TOP-3.png?raw=true)
-**v2.0** Evaluation set's accuracy (**Top-3**):  **99.75%**
 ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1049_conf_mat_TOP-3.png?raw=true)
-**v1.0** Evaluation set's accuracy (**Top-1**):  **97.3%**
 ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1436_conf_mat_TOP-1.png?raw=true)
-**v2.0** Evaluation set's accuracy (**Top-1**):  **96.82%**
 ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1055_conf_mat_TOP-1.png?raw=true)
 #### Result tables
-- **v1.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1426_model_1119_3_TOP-3_EVAL.csv) 🔗
-- **v1.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1431_model_1119_3_TOP-1_EVAL.csv) 🔗
-- **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1044_model_672_3_TOP-3_EVAL.csv) 🔗
-- **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1050_model_672_3_TOP-1_EVAL.csv) 🔗
 #### Table columns
@@ -150,7 +157,7 @@ Official repository: UFAL [^3]
 - **Developed by** UFAL [^5] 👥
 - **Funded by** ATRIUM [^4]  💰
 - **Shared by** ATRIUM [^4] & UFAL [^5]
-- **Model type:** fine-tuned ViT [^2] with a 224x224 resolution size
 **©️ 2022 UFAL & ATRIUM**
@@ -159,3 +166,5 @@ Official repository: UFAL [^3]
 [^3]: https://github.com/ufal/atrium-page-classification
 [^4]: https://atrium-research.eu/
 [^5]: https://ufal.mff.cuni.cz/home-page

 - classification
 base_model:
 - google/vit-base-patch16-224
+- google/vit-base-patch16-384
+- google/vit-large-patch16-384
 pipeline_tag: image-classification
 license: mit
 ---
 ## Versions 🏁
 There are currently 2 version of the model available for download, both of them have the same set of categories,
+but different data annotations. The latest approved `v2.1` is considered to be default and can be found in the `main` branch
+of HF 😊 hub [^1] 🔗
+| Version | Base                   | Pages |   PDFs   | Description                                                               |
+|--------:|------------------------|:-----:|:--------:|:--------------------------------------------------------------------------|
+|  `v2.0` | `vit-base-path16-224`  | 10073 | **3896** | annotations with mistakes, more heterogenous data                         |
+|  `v2.1` | `vit-base-path16-224`  | 11940 | **5002** | `main`: more diverse pages in each category, less annotation mistakes     |
+|  `v2.2` | `vit-base-path16-224`  | 15855 | **5730** | same data as `v2.1` + some restored pages from `v2.0`                     |
+|  `v3.2` | `vit-base-path16-384`  | 15855 | **5730** | same data as `v2.0.2`, but a bit larger model base with higher resolution |
+|  `v5.2` | `vit-large-path16-384` | 15855 | **5730** | same data as `v2.0.2`, but the largest model base with higher resolution  |
 ## Model description 📇
 🔲 Fine-tuned model repository:  vit-historical-page [^1] 🔗
+🔳 Base model repository: Google's **vit-base-patch16-224**,  **vit-base-patch16-384**,  **vit-large-patch16-284** [^2] [^13] [^14] 🔗
 ### Data 📜
+Training set of the model: **8950** images for v2.0
+Training set of the model: **10745** images for v2.1
 ### Categories 🏷️
+**v2.0 version Categories 🪧**:
 |    Label️ | Ratio  | Description                                                                   |
 |----------:|:------:|:------------------------------------------------------------------------------|
 |  `TEXT_P` | 6.95%  | **📄 - only printed text**                                                    |
 |  `TEXT_T` | 13.53% | **📄 - only machine typed text**                                              |
+**v2.1 version Categories 🪧**:
 |    Label️ | Ratio | Description                                                                   |
 |----------:|:-----:|:------------------------------------------------------------------------------|
 |  `TEXT_P` | 9.07% | **📄 - only printed text**                                                    |
 |  `TEXT_T` | 9.05% | **📄 - only machine typed text**                                              |
+Evaluation set (same proportions):	**995** images for v2.0
+Evaluation set (same proportions):	**1194** images for v2.1
 #### Data preprocessing
 ### Results 📊
+**v2.0** Evaluation set's accuracy (**Top-3**):  **99.6%**
 ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1430_conf_mat_TOP-3.png?raw=true)
+**v2.1** Evaluation set's accuracy (**Top-3**):  **99.75%**
 ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1049_conf_mat_TOP-3.png?raw=true)
+**v2.0** Evaluation set's accuracy (**Top-1**):  **97.3%**
 ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1436_conf_mat_TOP-1.png?raw=true)
+**v2.1** Evaluation set's accuracy (**Top-1**):  **96.82%**
 ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1055_conf_mat_TOP-1.png?raw=true)
 #### Result tables
+- **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1426_model_1119_3_TOP-3_EVAL.csv) 🔗
+- **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1431_model_1119_3_TOP-1_EVAL.csv) 🔗
+- **v2.1** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1044_model_672_3_TOP-3_EVAL.csv) 🔗
+- **v2.1** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1050_model_672_3_TOP-1_EVAL.csv) 🔗
 #### Table columns
 - **Developed by** UFAL [^5] 👥
 - **Funded by** ATRIUM [^4]  💰
 - **Shared by** ATRIUM [^4] & UFAL [^5]
+- **Model type:** fine-tuned ViT with a 224x224 [^2] 🔗 or 384x384 [^13] [^14] 🔗 resolution size
 **©️ 2022 UFAL & ATRIUM**
 [^3]: https://github.com/ufal/atrium-page-classification
 [^4]: https://atrium-research.eu/
 [^5]: https://ufal.mff.cuni.cz/home-page
+[^6]: https://huggingface.co/google/vit-base-patch16-384
+[^7]: https://huggingface.co/google/vit-large-patch16-384