File size: 2,390 Bytes
55bc791 bfcb348 e6c6c8e bfcb348 235b96d 1845a73 55bc791 bfcb348 55bc791 bfcb348 55bc791 bfcb348 55bc791 bfcb348 55bc791 bfcb348 55bc791 bfcb348 55bc791 bfcb348 55bc791 235b96d 55bc791 235b96d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
library_name: transformers
language:
- en
- tpi
base_model:
- Helsinki-NLP/opus-mt-en-tpi
pipeline_tag: translation
datasets:
- RickBrannan/tpi_eng_sentence_pairs
license: apache-2.0
---
# Model Card for Model ID
This model is a fine-tune of the `opus-mt-en-tpi` model.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Model type:** Translation
- **Language(s) (NLP):** English, Tok Pisin
- **Finetuned from model [optional]:** https://huggingface.co/Helsinki-NLP/opus-mt-en-tpi
### Model Sources
- **Repository:** https://huggingface.co/Helsinki-NLP/opus-mt-en-tpi
## Uses
This model is intended for translation of English material into Tok Pisin. It is fine-tuned on material from Bible stories, from selected articles of a Bible Dictionary translated into Tok Pisin, and from translation of the deuterocanon (apocrypha) into Tok Pisin.
## How to Get Started with the Model
Use the code below to get started with the model.
```
from transformers import pipeline
pipe = pipeline("translation", model="RickBrannan/opus-mt-en-tpi-finetune")
translation = pipe(">>tpi<< In the beginning, God created the heavens and the earth.")
```
## Training Details
### Training Data
* 1,100+ English-Tok Pisin sentence pairs based on "Open Bible Stories" from unfoldingWord: https://git.door43.org/door43-Catalog/tpi_obs
* 2,600+ English-Tok Pisin sentence pairs based on translations from English into Tok Pisin pulled from the Bible Aquifer: https://aquifer.bible
* 4,150+ English-Tok Pisin sentence pairs based on translation of deuterocanonical books of the Bible into Tok Pisin: https://ebible.org/Scriptures/details.php?id=tpi
The first two sources are available via CC-BY-SA license and available in the [RickBrannan/tpi_eng_sentence_pairs](https://huggingface.co/datasets/RickBrannan/tpi_eng_sentence_pairs) dataset. The Tok Pisin of the last source is available via CC-BY-NC-ND from the website listed. For the English, we used the text of the deuterocanon of the [World English Bible (WEB)](https://ebible.org/Scriptures/details.php?id=eng-web) where references matched. Where WEB did not have a reference, we used the [deuterocanon of the English Revised Version (RV)](https://ebible.org/Scriptures/details.php?id=eng-rv).
#### Testing Data
Testing data was 10% of the sentences from the above specified training data. |