stefan-it commited on
Commit
ba6be74
·
1 Parent(s): 5cd5870

readme: introduce mean noise span length section

Browse files
Files changed (1) hide show
  1. README.md +13 -5
README.md CHANGED
@@ -19,13 +19,21 @@ Details about the training can be found [here](https://github.com/stefan-it/hmBy
19
 
20
  This model was trained with `mean_noise_span_length=20` for one epoch.
21
 
22
- # Evaluation on Downstream Tasks (NER)
23
 
24
- We evaluated the hmByT5 model on downstream tasks:
 
 
 
25
 
26
- | Model | English AjMC | German AjMC | French AjMC | Finnish NewsEye | Swedish NewsEye | Dutch ICDAR | French ICDAR | Avg. |
27
- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--------------|--------------|-----------------|-----------------|--------------|--------------|------|
28
- | [`hmbyt5/byt5-small-english`](https://huggingface.co/hmbyt5/byt5-small-english) | 85.65 ± 1.21 | 87.27 ± 0.50 | 84.44 ± 0.79 | | | | | |
 
 
 
 
 
29
 
30
  # Acknowledgements
31
 
 
19
 
20
  This model was trained with `mean_noise_span_length=20` for one epoch.
21
 
22
+ # Mean Noise Span Length
23
 
24
+ The previously pretrained hmByT5 models "accidentally" use a mean noise span length of 3, because this value is the
25
+ default one for T5. But the ByT5 paper mentions, that using a length of 3 would make pretraining tasks too easy, and
26
+ recommend a value of 20. Thus, we pretrained this model with `mean_noise_span_length=20` and fine-tuned it on English
27
+ AjMC dataset:
28
 
29
+ | Configuration | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg. |
30
+ |------------------------------------------|-------|-------|-------|-------|-------|--------------|
31
+ | `wsFalse-bs4-e10-lr0.00015-poolingfirst` | 85.48 | 84.6 | 85.65 | 86.83 | 86.53 | 85.82 ± 0.79 |
32
+ | `wsFalse-bs4-e10-lr0.00016-poolingfirst` | 85.35 | 84.5 | 86.05 | 85.1 | 85.18 | 85.24 ± 0.5 |
33
+ | `wsFalse-bs8-e10-lr0.00016-poolingfirst` | 84.14 | 83.45 | 84.4 | 84.9 | 85.82 | 84.54 ± 0.79 |
34
+ | `wsFalse-bs8-e10-lr0.00015-poolingfirst` | 85.27 | 85.3 | 83.33 | 85.25 | 81.7 | 84.17 ± 1.45 |
35
+
36
+ For comparison the model using a length of 3 achieved 85.65 ± 1.21.
37
 
38
  # Acknowledgements
39