ufal
/

k4tel commited on
Commit
5574e25
Β·
verified Β·
1 Parent(s): 46c0dac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -21
README.md CHANGED
@@ -5,6 +5,8 @@ tags:
5
  - classification
6
  base_model:
7
  - google/vit-base-patch16-224
 
 
8
  pipeline_tag: image-classification
9
  license: mit
10
  ---
@@ -21,28 +23,33 @@ HF 😊 hub support for the model
21
  ## Versions 🏁
22
 
23
  There are currently 2 version of the model available for download, both of them have the same set of categories,
24
- but different data annotations. The latest `v2.0` is considered to be default.
 
 
 
 
 
 
 
 
 
25
 
26
- | Version | Pages | N-page files | PDFs | Description |
27
- |--------:|:-----:|:------------:|:--------:|:--------------------------------------------------------------|
28
- | `v1.0` | 10073 | **~104** | **3896** | annotations with mistakes, more heterogenous data |
29
- | `v1.0` | 11940 | **~509** | **5002** | more diverse pages in each category, less annotation mistakes |
30
 
31
  ## Model description πŸ“‡
32
 
33
  πŸ”² Fine-tuned model repository: vit-historical-page [^1] πŸ”—
34
 
35
- πŸ”³ Base model repository: google's vit-base-patch16-224 [^2] πŸ”—
36
 
37
  ### Data πŸ“œ
38
 
39
- Training set of the model: **8950** images for v1.0
40
 
41
- Training set of the model: **10745** images for v2.0
42
 
43
  ### Categories 🏷️
44
 
45
- **v1.0 version Categories πŸͺ§**:
46
 
47
  | Label️ | Ratio | Description |
48
  |----------:|:------:|:------------------------------------------------------------------------------|
@@ -58,7 +65,7 @@ Training set of the model: **10745** images for v2.0
58
  | `TEXT_P` | 6.95% | **πŸ“„ - only printed text** |
59
  | `TEXT_T` | 13.53% | **πŸ“„ - only machine typed text** |
60
 
61
- **v2.0 version Categories πŸͺ§**:
62
 
63
  | Label️ | Ratio | Description |
64
  |----------:|:-----:|:------------------------------------------------------------------------------|
@@ -74,9 +81,9 @@ Training set of the model: **10745** images for v2.0
74
  | `TEXT_P` | 9.07% | **πŸ“„ - only printed text** |
75
  | `TEXT_T` | 9.05% | **πŸ“„ - only machine typed text** |
76
 
77
- Evaluation set (same proportions): **995** images for v1.0
78
 
79
- Evaluation set (same proportions): **1194** images for v2.0
80
 
81
 
82
  #### Data preprocessing
@@ -105,31 +112,31 @@ During training the following transforms were applied randomly with a 50% chance
105
 
106
  ### Results πŸ“Š
107
 
108
- **v1.0** Evaluation set's accuracy (**Top-3**): **99.6%**
109
 
110
  ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1430_conf_mat_TOP-3.png?raw=true)
111
 
112
- **v2.0** Evaluation set's accuracy (**Top-3**): **99.75%**
113
 
114
  ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1049_conf_mat_TOP-3.png?raw=true)
115
 
116
- **v1.0** Evaluation set's accuracy (**Top-1**): **97.3%**
117
 
118
  ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1436_conf_mat_TOP-1.png?raw=true)
119
 
120
- **v2.0** Evaluation set's accuracy (**Top-1**): **96.82%**
121
 
122
  ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1055_conf_mat_TOP-1.png?raw=true)
123
 
124
  #### Result tables
125
 
126
- - **v1.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1426_model_1119_3_TOP-3_EVAL.csv) πŸ”—
127
 
128
- - **v1.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1431_model_1119_3_TOP-1_EVAL.csv) πŸ”—
129
 
130
- - **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1044_model_672_3_TOP-3_EVAL.csv) πŸ”—
131
 
132
- - **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1050_model_672_3_TOP-1_EVAL.csv) πŸ”—
133
 
134
  #### Table columns
135
 
@@ -150,7 +157,7 @@ Official repository: UFAL [^3]
150
  - **Developed by** UFAL [^5] πŸ‘₯
151
  - **Funded by** ATRIUM [^4] πŸ’°
152
  - **Shared by** ATRIUM [^4] & UFAL [^5]
153
- - **Model type:** fine-tuned ViT [^2] with a 224x224 resolution size
154
 
155
  **©️ 2022 UFAL & ATRIUM**
156
 
@@ -159,3 +166,5 @@ Official repository: UFAL [^3]
159
  [^3]: https://github.com/ufal/atrium-page-classification
160
  [^4]: https://atrium-research.eu/
161
  [^5]: https://ufal.mff.cuni.cz/home-page
 
 
 
5
  - classification
6
  base_model:
7
  - google/vit-base-patch16-224
8
+ - google/vit-base-patch16-384
9
+ - google/vit-large-patch16-384
10
  pipeline_tag: image-classification
11
  license: mit
12
  ---
 
23
  ## Versions 🏁
24
 
25
  There are currently 2 version of the model available for download, both of them have the same set of categories,
26
+ but different data annotations. The latest approved `v2.1` is considered to be default and can be found in the `main` branch
27
+ of HF 😊 hub [^1] πŸ”—
28
+
29
+ | Version | Base | Pages | PDFs | Description |
30
+ |--------:|------------------------|:-----:|:--------:|:--------------------------------------------------------------------------|
31
+ | `v2.0` | `vit-base-path16-224` | 10073 | **3896** | annotations with mistakes, more heterogenous data |
32
+ | `v2.1` | `vit-base-path16-224` | 11940 | **5002** | `main`: more diverse pages in each category, less annotation mistakes |
33
+ | `v2.2` | `vit-base-path16-224` | 15855 | **5730** | same data as `v2.1` + some restored pages from `v2.0` |
34
+ | `v3.2` | `vit-base-path16-384` | 15855 | **5730** | same data as `v2.0.2`, but a bit larger model base with higher resolution |
35
+ | `v5.2` | `vit-large-path16-384` | 15855 | **5730** | same data as `v2.0.2`, but the largest model base with higher resolution |
36
 
 
 
 
 
37
 
38
  ## Model description πŸ“‡
39
 
40
  πŸ”² Fine-tuned model repository: vit-historical-page [^1] πŸ”—
41
 
42
+ πŸ”³ Base model repository: Google's **vit-base-patch16-224**, **vit-base-patch16-384**, **vit-large-patch16-284** [^2] [^13] [^14] πŸ”—
43
 
44
  ### Data πŸ“œ
45
 
46
+ Training set of the model: **8950** images for v2.0
47
 
48
+ Training set of the model: **10745** images for v2.1
49
 
50
  ### Categories 🏷️
51
 
52
+ **v2.0 version Categories πŸͺ§**:
53
 
54
  | Label️ | Ratio | Description |
55
  |----------:|:------:|:------------------------------------------------------------------------------|
 
65
  | `TEXT_P` | 6.95% | **πŸ“„ - only printed text** |
66
  | `TEXT_T` | 13.53% | **πŸ“„ - only machine typed text** |
67
 
68
+ **v2.1 version Categories πŸͺ§**:
69
 
70
  | Label️ | Ratio | Description |
71
  |----------:|:-----:|:------------------------------------------------------------------------------|
 
81
  | `TEXT_P` | 9.07% | **πŸ“„ - only printed text** |
82
  | `TEXT_T` | 9.05% | **πŸ“„ - only machine typed text** |
83
 
84
+ Evaluation set (same proportions): **995** images for v2.0
85
 
86
+ Evaluation set (same proportions): **1194** images for v2.1
87
 
88
 
89
  #### Data preprocessing
 
112
 
113
  ### Results πŸ“Š
114
 
115
+ **v2.0** Evaluation set's accuracy (**Top-3**): **99.6%**
116
 
117
  ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1430_conf_mat_TOP-3.png?raw=true)
118
 
119
+ **v2.1** Evaluation set's accuracy (**Top-3**): **99.75%**
120
 
121
  ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1049_conf_mat_TOP-3.png?raw=true)
122
 
123
+ **v2.0** Evaluation set's accuracy (**Top-1**): **97.3%**
124
 
125
  ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1436_conf_mat_TOP-1.png?raw=true)
126
 
127
+ **v2.1** Evaluation set's accuracy (**Top-1**): **96.82%**
128
 
129
  ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1055_conf_mat_TOP-1.png?raw=true)
130
 
131
  #### Result tables
132
 
133
+ - **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1426_model_1119_3_TOP-3_EVAL.csv) πŸ”—
134
 
135
+ - **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1431_model_1119_3_TOP-1_EVAL.csv) πŸ”—
136
 
137
+ - **v2.1** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1044_model_672_3_TOP-3_EVAL.csv) πŸ”—
138
 
139
+ - **v2.1** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1050_model_672_3_TOP-1_EVAL.csv) πŸ”—
140
 
141
  #### Table columns
142
 
 
157
  - **Developed by** UFAL [^5] πŸ‘₯
158
  - **Funded by** ATRIUM [^4] πŸ’°
159
  - **Shared by** ATRIUM [^4] & UFAL [^5]
160
+ - **Model type:** fine-tuned ViT with a 224x224 [^2] πŸ”— or 384x384 [^13] [^14] πŸ”— resolution size
161
 
162
  **©️ 2022 UFAL & ATRIUM**
163
 
 
166
  [^3]: https://github.com/ufal/atrium-page-classification
167
  [^4]: https://atrium-research.eu/
168
  [^5]: https://ufal.mff.cuni.cz/home-page
169
+ [^6]: https://huggingface.co/google/vit-base-patch16-384
170
+ [^7]: https://huggingface.co/google/vit-large-patch16-384