tatiana-merz
/

turkic-cyrillic-classifier

Text Classification

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

tatiana-merz commited on Mar 15, 2023

Commit

4490708

·

1 Parent(s): 672d32c

Update README.md

Files changed (1) hide show

README.md +54 -20

README.md CHANGED Viewed

@@ -1,46 +1,80 @@
 model-index:
 - name: turkic-cyrillic-classifier
   results: []
----
-This model card has been generated automatically according to the information the Trainer had access to. You
-# turkic-cyrillic-classifier
-This model is a fine-tuned version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0139
-- Accuracy: 0.9971
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 - Transformers 4.27.0
 - Pytorch 1.13.1+cu116
-- Datasets 2.10.1
-- Tokenizers 0.13.2

+---
+license: apache-2.0
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+- f1
 model-index:
 - name: turkic-cyrillic-classifier
   results: []
+language:
+- ba
+- cv
+- sah
+- tt
+- ky
+- kk
+- tyv
+- krc
+- ru
+datasets:
+- tatiana-merz/cyrillic_turkic_langs
+pipeline_tag: text-classification
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# turkic-cyrillic-classifier
+This model is a fine-tuned version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) on an tatiana-merz/cyrillic_turkic_langs dataset.
+It achieves the following results on the evaluation set:
+{'test_loss': 0.013604652136564255,
+ 'test_accuracy': 0.997,
+ 'test_f1': 0.9969996069718668,
+ 'test_runtime': 60.5479,
+ 'test_samples_per_second': 148.643,
+ 'test_steps_per_second': 2.329}
+## Model description
+The model classifies text based on a provided Turkic language written in Cyrillic script.
+## Intended uses & limitations
+## Training and evaluation data
+[cyrillic_turkic_langs](https://huggingface.co/datasets/tatiana-merz/cyrillic_turkic_langs/)
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 2
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
+| 0.1063        | 1.0   | 1000 | 0.0204          | 0.9950   | 0.9950 |
+| 0.0126        | 2.0   | 2000 | 0.0136          | 0.9970   | 0.9970 |
+### Framework versions
 - Transformers 4.27.0
 - Pytorch 1.13.1+cu116
+- Datasets 2.10.1