English-to-Telugu Translation Model
Overview
This project is a deep learning-based English-to-Telugu translation model trained on a custom dataset. It uses Hugging Face Transformers for NLP and was developed in Google Colab. The model can be used for translating sentences with improved contextual accuracy.
Features
โ
Translates English text to Telugu
โ
Trained on a custom bilingual dataset
โ
Uses Transformer-based model
โ
Implemented and trained in Google Colab
โ
Can be fine-tuned for better accuracy
Tech Stack
- Programming Language: Python
- Framework: Hugging Face Transformers
- Model: mBART (Fine-tuned)
- Libraries:
- transformers (Hugging Face)
torch
(PyTorch)sentencepiece
(Tokenization)
- Platform: Google Colab
Dataset
- Used a custom English-Telugu parallel corpus
- Preprocessed using:
- Tokenization (SentencePiece / WordPiece)
- Lowercasing & Cleaning
- Removing noisy data
Model Training
Training was done in Google Colab using a GPU. Hereโs a snippet of the fine-tuning process:
from transformers import MarianMTModel, MarianTokenizer, Trainer, TrainingArguments
Load pre-trained model & tokenizer
model_name = "aryaumesh/english-to-telugu" # Base model tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name)
Preprocess dataset (example)
def encode_data(texts): return tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
Training arguments
training_args = TrainingArguments( output_dir="./results", per_device_train_batch_size=8, num_train_epochs=3, save_steps=1000, save_total_limit=2, )
trainer = Trainer( model=model, args=training_args, train_dataset=custom_dataset, )
trainer.train()
Run the Model
def translate(text): inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) translated = model.generate(**inputs) return tokenizer.decode(translated[0], skip_special_tokens=True)
english_text = "Good morning, how are you?" telugu_translation = translate(english_text) print("Translated Text:", telugu_translation)
Future Improvements
๐น Train on a larger dataset for better accuracy
๐น Optimize inference speed for real-time use
๐น Deploy as a cloud-based API (AWS/GCP)
- Downloads last month
- 0
Model tree for archita091234/fine-tuned-translation
Base model
aryaumesh/english-to-telugu