Bapynshngain/Bapyn-En-Kha

What it is:

English-to-Khasi Translation Model

More about this model:

This model is a fine-tuned version of my previous model: Bapynshngain/MarianMT-en-kha
The training was conducted on my own curated dataset. The dataset comprises of approximately 40,000 high quality parallel pairs.
Almost half of it was manually translated and vetted by me.
The rest of the dataset was obtained from NIT Silchar, and Tatoeba project.
I would also like to acknowledge Ahlad from IIIT Guwahati for helping me in curating the dataset.

usage:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

tokenizer = AutoTokenizer.from_pretrained("Bapynshngain/BarHeli-en-kha")
model = AutoModelForSeq2SeqLM.from_pretrained("Bapynshngain/BarHeli-en-kha")

def translate_to_khasi(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        translated = model.generate(**inputs, num_beams=4, max_length=512)
    translated_text = tokenizer.decode(translated[0], skip_special_tokens=True)
    return translated_text

if __name__ == "__main__":
    while True:
        english_sentence = input("Enter an English sentence (or type 'q' to quit): ")
        
        if english_sentence.lower() == 'q':
            break
        
        khasi_translation = translate_to_khasi(english_sentence)
        print(f"Khasi Translation: {khasi_translation}")

Bapynshngain
/

Bapyn-En-Kha

You need to agree to share your contact information to access this model

Model tree for Bapynshngain/Bapyn-En-Kha

Dataset used to train Bapynshngain/Bapyn-En-Kha