repleeka
/

eng-tagin-nmt

text2text-generation

Model card Files Files and versions Community

eng-tagin-nmt / README.md

repleeka's picture

Update README.md

2eb49fc verified 6 months ago

|

2.96 kB

	---
	license: cc-by-nd-4.0
	language:
	- en
	- tag
	metrics:
	- bleu
	base_model:
	- Helsinki-NLP/opus-mt-en-hi
	pipeline_tag: translation
	tags:
	- nmt
	- tagin
	- english
	library_name: transformers
	---
	# Model Card for Model ID

	The `eng_tag_nmt` model is a neural machine translation (NMT) model fine-tuned on the `GinLish Corpus v0.1.0` (under development), which consists of `English` and `Tagin` language pairs. Tagin, an `extremely low-resource language` spoken in Arunachal Pradesh, India, faces challenges due to a scarcity of digital resources and linguistic datasets. The goal of this model is to provide translation support for Tagin, helping to preserve and promote its use in digital spaces.

	To develop `eng_tag_nmt`, the pre-trained model `Helsinki-NLP/opus-mt-en-hi` (English-to-Hindi) was leveraged as a foundation, given the structural similarities between Hindi and Tagin in a multilingual context. Transfer learning on this model allowed efficient adaptation of the Tagin translation model, despite limited language data.

	## Model Details

	### Model Description

	- Developed by: Tungon Dugi
	- Affiliation: National Institute of Technology Arunachal Pradesh, India
	- Email: [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected])
	- Model type: Translation
	- Language(s) (NLP): English (en) and Tagin (tag)
	- Finetuned from model: Helsinki-NLP/opus-mt-en-hi


	## Uses

	### Direct Use

	This model can be used for translation and text-to-text generation.


	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-tagin-nmt")
	model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-tagin-nmt")
	```

	## Training Details

	### Training Data

	[GinLish Corpus v0.1.0](#)

	## Evaluation

	The model achieved the following metrics after 10 training epochs:

	\| Metric \| Value \|
	\|----------------------\|-------------------\|
	\| BLEU Score \| 26.2526 \|
	\| Evaluation Runtime \| 628.34 seconds \|

	The model’s BLEU score suggests promising results, with the low evaluation loss indicating strong translation performance on the GinLish Corpus, suitable for practical applications. This model represents a significant advancement for Tagin language resources, enabling English-to-Tagin translation in NLP applications.

	#### Summary

	The `eng_tag_nmt` model is currently in its early phase of development. To enhance its performance, it requires a more substantial dataset and improved training resources. This would facilitate better generalization and accuracy in translating between English and Tagin, addressing the challenges faced by this extremely low-resource language. As the model evolves, ongoing efforts will be necessary to refine its capabilities further.