|
--- |
|
license: cc-by-nd-4.0 |
|
language: |
|
- en |
|
- tag |
|
metrics: |
|
- bleu |
|
base_model: |
|
- Helsinki-NLP/opus-mt-en-hi |
|
pipeline_tag: translation |
|
tags: |
|
- nmt |
|
- tagin |
|
- english |
|
library_name: transformers |
|
--- |
|
# Model Card for Model ID |
|
|
|
The `eng_tag_nmt` model is a neural machine translation (NMT) model fine-tuned on the `GinLish Corpus v0.1.0` (under development), which consists of `English` and `Tagin` language pairs. Tagin, an `extremely low-resource language` spoken in Arunachal Pradesh, India, faces challenges due to a scarcity of digital resources and linguistic datasets. The goal of this model is to provide translation support for Tagin, helping to preserve and promote its use in digital spaces. |
|
|
|
To develop `eng_tag_nmt`, the pre-trained model `Helsinki-NLP/opus-mt-en-hi` (English-to-Hindi) was leveraged as a foundation, given the structural similarities between Hindi and Tagin in a multilingual context. Transfer learning on this model allowed efficient adaptation of the Tagin translation model, despite limited language data. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Tungon Dugi |
|
- **Affiliation:** National Institute of Technology Arunachal Pradesh, India |
|
- **Email:** [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]) |
|
- **Model type:** Translation |
|
- **Language(s) (NLP):** English (en) and Tagin (tag) |
|
- **Finetuned from model:** Helsinki-NLP/opus-mt-en-hi |
|
|
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be used for translation and text-to-text generation. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-tagin-nmt") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-tagin-nmt") |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
[GinLish Corpus v0.1.0](#) |
|
|
|
## Evaluation |
|
|
|
The model achieved the following metrics after 10 training epochs: |
|
|
|
| Metric | Value | |
|
|----------------------|-------------------| |
|
| BLEU Score | 26.2526 | |
|
| Evaluation Runtime | 628.34 seconds | |
|
|
|
The model’s BLEU score suggests promising results, with the low evaluation loss indicating strong translation performance on the GinLish Corpus, suitable for practical applications. This model represents a significant advancement for Tagin language resources, enabling English-to-Tagin translation in NLP applications. |
|
|
|
#### Summary |
|
|
|
The `eng_tag_nmt` model is currently in its early phase of development. To enhance its performance, it requires a more substantial dataset and improved training resources. This would facilitate better generalization and accuracy in translating between English and Tagin, addressing the challenges faced by this extremely low-resource language. As the model evolves, ongoing efforts will be necessary to refine its capabilities further. |