Update README.md
Browse files
README.md
CHANGED
@@ -5,6 +5,8 @@ tags:
|
|
5 |
- classification
|
6 |
base_model:
|
7 |
- google/vit-base-patch16-224
|
|
|
|
|
8 |
pipeline_tag: image-classification
|
9 |
license: mit
|
10 |
---
|
@@ -21,28 +23,33 @@ HF π hub support for the model
|
|
21 |
## Versions π
|
22 |
|
23 |
There are currently 2 version of the model available for download, both of them have the same set of categories,
|
24 |
-
but different data annotations. The latest `v2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
-
| Version | Pages | N-page files | PDFs | Description |
|
27 |
-
|--------:|:-----:|:------------:|:--------:|:--------------------------------------------------------------|
|
28 |
-
| `v1.0` | 10073 | **~104** | **3896** | annotations with mistakes, more heterogenous data |
|
29 |
-
| `v1.0` | 11940 | **~509** | **5002** | more diverse pages in each category, less annotation mistakes |
|
30 |
|
31 |
## Model description π
|
32 |
|
33 |
π² Fine-tuned model repository: vit-historical-page [^1] π
|
34 |
|
35 |
-
π³ Base model repository:
|
36 |
|
37 |
### Data π
|
38 |
|
39 |
-
Training set of the model: **8950** images for
|
40 |
|
41 |
-
Training set of the model: **10745** images for v2.
|
42 |
|
43 |
### Categories π·οΈ
|
44 |
|
45 |
-
**
|
46 |
|
47 |
| LabelοΈ | Ratio | Description |
|
48 |
|----------:|:------:|:------------------------------------------------------------------------------|
|
@@ -58,7 +65,7 @@ Training set of the model: **10745** images for v2.0
|
|
58 |
| `TEXT_P` | 6.95% | **π - only printed text** |
|
59 |
| `TEXT_T` | 13.53% | **π - only machine typed text** |
|
60 |
|
61 |
-
**v2.
|
62 |
|
63 |
| LabelοΈ | Ratio | Description |
|
64 |
|----------:|:-----:|:------------------------------------------------------------------------------|
|
@@ -74,9 +81,9 @@ Training set of the model: **10745** images for v2.0
|
|
74 |
| `TEXT_P` | 9.07% | **π - only printed text** |
|
75 |
| `TEXT_T` | 9.05% | **π - only machine typed text** |
|
76 |
|
77 |
-
Evaluation set (same proportions): **995** images for
|
78 |
|
79 |
-
Evaluation set (same proportions): **1194** images for v2.
|
80 |
|
81 |
|
82 |
#### Data preprocessing
|
@@ -105,31 +112,31 @@ During training the following transforms were applied randomly with a 50% chance
|
|
105 |
|
106 |
### Results π
|
107 |
|
108 |
-
**
|
109 |
|
110 |

|
111 |
|
112 |
-
**v2.
|
113 |
|
114 |

|
115 |
|
116 |
-
**
|
117 |
|
118 |

|
119 |
|
120 |
-
**v2.
|
121 |
|
122 |

|
123 |
|
124 |
#### Result tables
|
125 |
|
126 |
-
- **
|
127 |
|
128 |
-
- **
|
129 |
|
130 |
-
- **v2.
|
131 |
|
132 |
-
- **v2.
|
133 |
|
134 |
#### Table columns
|
135 |
|
@@ -150,7 +157,7 @@ Official repository: UFAL [^3]
|
|
150 |
- **Developed by** UFAL [^5] π₯
|
151 |
- **Funded by** ATRIUM [^4] π°
|
152 |
- **Shared by** ATRIUM [^4] & UFAL [^5]
|
153 |
-
- **Model type:** fine-tuned ViT [^2]
|
154 |
|
155 |
**Β©οΈ 2022 UFAL & ATRIUM**
|
156 |
|
@@ -159,3 +166,5 @@ Official repository: UFAL [^3]
|
|
159 |
[^3]: https://github.com/ufal/atrium-page-classification
|
160 |
[^4]: https://atrium-research.eu/
|
161 |
[^5]: https://ufal.mff.cuni.cz/home-page
|
|
|
|
|
|
5 |
- classification
|
6 |
base_model:
|
7 |
- google/vit-base-patch16-224
|
8 |
+
- google/vit-base-patch16-384
|
9 |
+
- google/vit-large-patch16-384
|
10 |
pipeline_tag: image-classification
|
11 |
license: mit
|
12 |
---
|
|
|
23 |
## Versions π
|
24 |
|
25 |
There are currently 2 version of the model available for download, both of them have the same set of categories,
|
26 |
+
but different data annotations. The latest approved `v2.1` is considered to be default and can be found in the `main` branch
|
27 |
+
of HF π hub [^1] π
|
28 |
+
|
29 |
+
| Version | Base | Pages | PDFs | Description |
|
30 |
+
|--------:|------------------------|:-----:|:--------:|:--------------------------------------------------------------------------|
|
31 |
+
| `v2.0` | `vit-base-path16-224` | 10073 | **3896** | annotations with mistakes, more heterogenous data |
|
32 |
+
| `v2.1` | `vit-base-path16-224` | 11940 | **5002** | `main`: more diverse pages in each category, less annotation mistakes |
|
33 |
+
| `v2.2` | `vit-base-path16-224` | 15855 | **5730** | same data as `v2.1` + some restored pages from `v2.0` |
|
34 |
+
| `v3.2` | `vit-base-path16-384` | 15855 | **5730** | same data as `v2.0.2`, but a bit larger model base with higher resolution |
|
35 |
+
| `v5.2` | `vit-large-path16-384` | 15855 | **5730** | same data as `v2.0.2`, but the largest model base with higher resolution |
|
36 |
|
|
|
|
|
|
|
|
|
37 |
|
38 |
## Model description π
|
39 |
|
40 |
π² Fine-tuned model repository: vit-historical-page [^1] π
|
41 |
|
42 |
+
π³ Base model repository: Google's **vit-base-patch16-224**, **vit-base-patch16-384**, **vit-large-patch16-284** [^2] [^13] [^14] π
|
43 |
|
44 |
### Data π
|
45 |
|
46 |
+
Training set of the model: **8950** images for v2.0
|
47 |
|
48 |
+
Training set of the model: **10745** images for v2.1
|
49 |
|
50 |
### Categories π·οΈ
|
51 |
|
52 |
+
**v2.0 version Categories πͺ§**:
|
53 |
|
54 |
| LabelοΈ | Ratio | Description |
|
55 |
|----------:|:------:|:------------------------------------------------------------------------------|
|
|
|
65 |
| `TEXT_P` | 6.95% | **π - only printed text** |
|
66 |
| `TEXT_T` | 13.53% | **π - only machine typed text** |
|
67 |
|
68 |
+
**v2.1 version Categories πͺ§**:
|
69 |
|
70 |
| LabelοΈ | Ratio | Description |
|
71 |
|----------:|:-----:|:------------------------------------------------------------------------------|
|
|
|
81 |
| `TEXT_P` | 9.07% | **π - only printed text** |
|
82 |
| `TEXT_T` | 9.05% | **π - only machine typed text** |
|
83 |
|
84 |
+
Evaluation set (same proportions): **995** images for v2.0
|
85 |
|
86 |
+
Evaluation set (same proportions): **1194** images for v2.1
|
87 |
|
88 |
|
89 |
#### Data preprocessing
|
|
|
112 |
|
113 |
### Results π
|
114 |
|
115 |
+
**v2.0** Evaluation set's accuracy (**Top-3**): **99.6%**
|
116 |
|
117 |

|
118 |
|
119 |
+
**v2.1** Evaluation set's accuracy (**Top-3**): **99.75%**
|
120 |
|
121 |

|
122 |
|
123 |
+
**v2.0** Evaluation set's accuracy (**Top-1**): **97.3%**
|
124 |
|
125 |

|
126 |
|
127 |
+
**v2.1** Evaluation set's accuracy (**Top-1**): **96.82%**
|
128 |
|
129 |

|
130 |
|
131 |
#### Result tables
|
132 |
|
133 |
+
- **v2.0** Manually β **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1426_model_1119_3_TOP-3_EVAL.csv) π
|
134 |
|
135 |
+
- **v2.0** Manually β **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1431_model_1119_3_TOP-1_EVAL.csv) π
|
136 |
|
137 |
+
- **v2.1** Manually β **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1044_model_672_3_TOP-3_EVAL.csv) π
|
138 |
|
139 |
+
- **v2.1** Manually β **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1050_model_672_3_TOP-1_EVAL.csv) π
|
140 |
|
141 |
#### Table columns
|
142 |
|
|
|
157 |
- **Developed by** UFAL [^5] π₯
|
158 |
- **Funded by** ATRIUM [^4] π°
|
159 |
- **Shared by** ATRIUM [^4] & UFAL [^5]
|
160 |
+
- **Model type:** fine-tuned ViT with a 224x224 [^2] π or 384x384 [^13] [^14] π resolution size
|
161 |
|
162 |
**Β©οΈ 2022 UFAL & ATRIUM**
|
163 |
|
|
|
166 |
[^3]: https://github.com/ufal/atrium-page-classification
|
167 |
[^4]: https://atrium-research.eu/
|
168 |
[^5]: https://ufal.mff.cuni.cz/home-page
|
169 |
+
[^6]: https://huggingface.co/google/vit-base-patch16-384
|
170 |
+
[^7]: https://huggingface.co/google/vit-large-patch16-384
|