You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

What it is:

  • English-to-Khasi Translation Model

More about this model:

  • This model is a fine-tuned version of my previous model: Bapynshngain/MarianMT-en-kha
  • The training was conducted on my own curated dataset. The dataset comprises of approximately 40,000 high quality parallel pairs.
  • Almost half of it was manually translated and vetted by me.
  • The rest of the dataset was obtained from NIT Silchar, and Tatoeba project.
  • I would also like to acknowledge Ahlad from IIIT Guwahati for helping me in curating the dataset.

usage:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

tokenizer = AutoTokenizer.from_pretrained("Bapynshngain/BarHeli-en-kha")
model = AutoModelForSeq2SeqLM.from_pretrained("Bapynshngain/BarHeli-en-kha")

def translate_to_khasi(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        translated = model.generate(**inputs, num_beams=4, max_length=512)
    translated_text = tokenizer.decode(translated[0], skip_special_tokens=True)
    return translated_text

if __name__ == "__main__":
    while True:
        english_sentence = input("Enter an English sentence (or type 'q' to quit): ")
        
        if english_sentence.lower() == 'q':
            break
        
        khasi_translation = translate_to_khasi(english_sentence)
        print(f"Khasi Translation: {khasi_translation}")
Downloads last month
0
Safetensors
Model size
77.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bapynshngain/Bapyn-En-Kha

Finetuned
(13)
this model

Dataset used to train Bapynshngain/Bapyn-En-Kha