File size: 2,390 Bytes
55bc791
 
bfcb348
 
e6c6c8e
bfcb348
 
 
235b96d
 
1845a73
55bc791
 
 
 
bfcb348
55bc791
 
 
 
 
 
bfcb348
 
 
55bc791
bfcb348
55bc791
bfcb348
55bc791
 
 
bfcb348
55bc791
 
 
 
 
 
bfcb348
 
 
 
 
 
55bc791
 
 
 
 
bfcb348
 
 
55bc791
235b96d
55bc791
 
 
235b96d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
library_name: transformers
language:
- en
- tpi
base_model:
- Helsinki-NLP/opus-mt-en-tpi
pipeline_tag: translation
datasets:
- RickBrannan/tpi_eng_sentence_pairs
license: apache-2.0
---

# Model Card for Model ID

This model is a fine-tune of the `opus-mt-en-tpi` model.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
- **Model type:** Translation
- **Language(s) (NLP):** English, Tok Pisin
- **Finetuned from model [optional]:** https://huggingface.co/Helsinki-NLP/opus-mt-en-tpi

### Model Sources

- **Repository:** https://huggingface.co/Helsinki-NLP/opus-mt-en-tpi

## Uses

This model is intended for translation of English material into Tok Pisin. It is fine-tuned on material from Bible stories, from selected articles of a Bible Dictionary translated into Tok Pisin, and from translation of the deuterocanon (apocrypha) into Tok Pisin.


## How to Get Started with the Model

Use the code below to get started with the model.

```
from transformers import pipeline

pipe = pipeline("translation", model="RickBrannan/opus-mt-en-tpi-finetune")
translation = pipe(">>tpi<< In the beginning, God created the heavens and the earth.")
```

## Training Details

### Training Data

* 1,100+ English-Tok Pisin sentence pairs based on "Open Bible Stories" from unfoldingWord: https://git.door43.org/door43-Catalog/tpi_obs
* 2,600+ English-Tok Pisin sentence pairs based on translations from English into Tok Pisin pulled from the Bible Aquifer: https://aquifer.bible
* 4,150+ English-Tok Pisin sentence pairs based on translation of deuterocanonical books of the Bible into Tok Pisin: https://ebible.org/Scriptures/details.php?id=tpi

The first two sources are available via CC-BY-SA license and available in the [RickBrannan/tpi_eng_sentence_pairs](https://huggingface.co/datasets/RickBrannan/tpi_eng_sentence_pairs) dataset. The Tok Pisin of the last source is available via CC-BY-NC-ND from the website listed. For the English, we used the text of the deuterocanon of the [World English Bible (WEB)](https://ebible.org/Scriptures/details.php?id=eng-web) where references matched. Where WEB did not have a reference, we used the [deuterocanon of the English Revised Version (RV)](https://ebible.org/Scriptures/details.php?id=eng-rv).

#### Testing Data

Testing data was 10% of the sentences from the above specified training data.