Update README.md
Browse files
README.md
CHANGED
@@ -43,12 +43,6 @@ group_by_length=True,
|
|
43 |
|
44 |
I also evaluated the model on 20K dataset of video from youtube. We extract the title and tags (if possible) which is the input of the model. With videos with tags, we directly compare our tags with the existing tags. Otherwise, the obtained tags are evaluated by human. We see the results on link: https://drive.google.com/drive/folders/1RvywNl41QYNa2lthp-O8hakVCMsfX456
|
45 |
|
46 |
-
[1] T. V. Bui, O. T. Tran, P. Le-Hong, Improving Sequence Tagging for Vietnamese Text using Transformer-based Neural Models, Proceedings of PACLIC 2020. link: https://github.com/fpt-corp/vELECTRA.
|
47 |
-
|
48 |
-
[2] Dat Quoc Nguyen and Anh-Tuan Nguyen. 2020. Phobert: Pre-trained language models for vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1037–1042.
|
49 |
-
|
50 |
-
[3] Long Phan, Hieu Tran, Hieu Nguyen, and Trieu H Trinh. 2022. Vit5: Pretrained text-to-text transformer for vietnamese language generation. arXiv preprint arXiv:2205.06457. link: https://github.com/vietai/ViT5
|
51 |
-
|
52 |
How to use the model
|
53 |
|
54 |
tokenizer = AutoTokenizer.from_pretrained("banhabang/vit5-base-tag-generation")
|
@@ -72,4 +66,12 @@ outputs = model.generate(
|
|
72 |
|
73 |
for output in outputs:
|
74 |
|
75 |
-
outputs = tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
I also evaluated the model on 20K dataset of video from youtube. We extract the title and tags (if possible) which is the input of the model. With videos with tags, we directly compare our tags with the existing tags. Otherwise, the obtained tags are evaluated by human. We see the results on link: https://drive.google.com/drive/folders/1RvywNl41QYNa2lthp-O8hakVCMsfX456
|
45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
How to use the model
|
47 |
|
48 |
tokenizer = AutoTokenizer.from_pretrained("banhabang/vit5-base-tag-generation")
|
|
|
66 |
|
67 |
for output in outputs:
|
68 |
|
69 |
+
outputs = tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
|
70 |
+
|
71 |
+
Reference
|
72 |
+
|
73 |
+
[1] T. V. Bui, O. T. Tran, P. Le-Hong, Improving Sequence Tagging for Vietnamese Text using Transformer-based Neural Models, Proceedings of PACLIC 2020. link: https://github.com/fpt-corp/vELECTRA.
|
74 |
+
|
75 |
+
[2] Dat Quoc Nguyen and Anh-Tuan Nguyen. 2020. Phobert: Pre-trained language models for vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1037–1042.
|
76 |
+
|
77 |
+
[3] Long Phan, Hieu Tran, Hieu Nguyen, and Trieu H Trinh. 2022. Vit5: Pretrained text-to-text transformer for vietnamese language generation. arXiv preprint arXiv:2205.06457. link: https://github.com/vietai/ViT5
|