banhabang commited on
Commit
b04c1a9
·
1 Parent(s): 68435c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -43,12 +43,6 @@ group_by_length=True,
43
 
44
  I also evaluated the model on 20K dataset of video from youtube. We extract the title and tags (if possible) which is the input of the model. With videos with tags, we directly compare our tags with the existing tags. Otherwise, the obtained tags are evaluated by human. We see the results on link: https://drive.google.com/drive/folders/1RvywNl41QYNa2lthp-O8hakVCMsfX456
45
 
46
- [1] T. V. Bui, O. T. Tran, P. Le-Hong, Improving Sequence Tagging for Vietnamese Text using Transformer-based Neural Models, Proceedings of PACLIC 2020. link: https://github.com/fpt-corp/vELECTRA.
47
-
48
- [2] Dat Quoc Nguyen and Anh-Tuan Nguyen. 2020. Phobert: Pre-trained language models for vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1037–1042.
49
-
50
- [3] Long Phan, Hieu Tran, Hieu Nguyen, and Trieu H Trinh. 2022. Vit5: Pretrained text-to-text transformer for vietnamese language generation. arXiv preprint arXiv:2205.06457. link: https://github.com/vietai/ViT5
51
-
52
  How to use the model
53
 
54
  tokenizer = AutoTokenizer.from_pretrained("banhabang/vit5-base-tag-generation")
@@ -72,4 +66,12 @@ outputs = model.generate(
72
 
73
  for output in outputs:
74
 
75
- outputs = tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
 
 
 
 
 
 
 
 
 
43
 
44
  I also evaluated the model on 20K dataset of video from youtube. We extract the title and tags (if possible) which is the input of the model. With videos with tags, we directly compare our tags with the existing tags. Otherwise, the obtained tags are evaluated by human. We see the results on link: https://drive.google.com/drive/folders/1RvywNl41QYNa2lthp-O8hakVCMsfX456
45
 
 
 
 
 
 
 
46
  How to use the model
47
 
48
  tokenizer = AutoTokenizer.from_pretrained("banhabang/vit5-base-tag-generation")
 
66
 
67
  for output in outputs:
68
 
69
+ outputs = tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
70
+
71
+ Reference
72
+
73
+ [1] T. V. Bui, O. T. Tran, P. Le-Hong, Improving Sequence Tagging for Vietnamese Text using Transformer-based Neural Models, Proceedings of PACLIC 2020. link: https://github.com/fpt-corp/vELECTRA.
74
+
75
+ [2] Dat Quoc Nguyen and Anh-Tuan Nguyen. 2020. Phobert: Pre-trained language models for vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1037–1042.
76
+
77
+ [3] Long Phan, Hieu Tran, Hieu Nguyen, and Trieu H Trinh. 2022. Vit5: Pretrained text-to-text transformer for vietnamese language generation. arXiv preprint arXiv:2205.06457. link: https://github.com/vietai/ViT5