|
--- |
|
library_name: transformers |
|
language: |
|
- en |
|
- tpi |
|
base_model: |
|
- Helsinki-NLP/opus-mt-en-tpi |
|
pipeline_tag: translation |
|
datasets: |
|
- RickBrannan/tpi_eng_sentence_pairs |
|
license: apache-2.0 |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This model is a fine-tune of the `opus-mt-en-tpi` model. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
- **Model type:** Translation |
|
- **Language(s) (NLP):** English, Tok Pisin |
|
- **Finetuned from model [optional]:** https://huggingface.co/Helsinki-NLP/opus-mt-en-tpi |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://huggingface.co/Helsinki-NLP/opus-mt-en-tpi |
|
|
|
## Uses |
|
|
|
This model is intended for translation of English material into Tok Pisin. It is fine-tuned on material from Bible stories, from selected articles of a Bible Dictionary translated into Tok Pisin, and from translation of the deuterocanon (apocrypha) into Tok Pisin. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
``` |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("translation", model="RickBrannan/opus-mt-en-tpi-finetune") |
|
translation = pipe(">>tpi<< In the beginning, God created the heavens and the earth.") |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
* 1,100+ English-Tok Pisin sentence pairs based on "Open Bible Stories" from unfoldingWord: https://git.door43.org/door43-Catalog/tpi_obs |
|
* 2,600+ English-Tok Pisin sentence pairs based on translations from English into Tok Pisin pulled from the Bible Aquifer: https://aquifer.bible |
|
* 4,150+ English-Tok Pisin sentence pairs based on translation of deuterocanonical books of the Bible into Tok Pisin: https://ebible.org/Scriptures/details.php?id=tpi |
|
|
|
The first two sources are available via CC-BY-SA license and available in the [RickBrannan/tpi_eng_sentence_pairs](https://huggingface.co/datasets/RickBrannan/tpi_eng_sentence_pairs) dataset. The Tok Pisin of the last source is available via CC-BY-NC-ND from the website listed. For the English, we used the text of the deuterocanon of the [World English Bible (WEB)](https://ebible.org/Scriptures/details.php?id=eng-web) where references matched. Where WEB did not have a reference, we used the [deuterocanon of the English Revised Version (RV)](https://ebible.org/Scriptures/details.php?id=eng-rv). |
|
|
|
#### Testing Data |
|
|
|
Testing data was 10% of the sentences from the above specified training data. |