---
language:  
  - en  
  - ta  
license: cc-by-4.0  
tags:  
  - translation  
  - tamil  
  - colloquial-tamil  
  - fine-tuned  
  - text-to-text  
datasets:  
  - janisrebekahv/colloquial_tamil  
  - jarvisvasu/english-to-colloquial-tamil  
  - chatgpt-generated  
  - youtube-comments  
model-index:  
  - name: janisrebekahv/finetuned-colloquial-tamil  
    results:  
      - task:  
          type: translation  
          name: English to Colloquial Tamil  
        dataset:  
          name: janisrebekahv/colloquial_tamil  
          type: text  
        metrics:  
          - name: BLEU Score  
            type: bleu  
            value: 38.5  
          - name: ROUGE Score  
            type: rouge  
            value: 0.72  
---
# janisrebekahv/finetuned-colloquial-tamil  

## 📌 Model Overview  
This is a **fine-tuned version of [suriya7/English-to-Tamil](https://huggingface.co/suriya7/English-to-Tamil)**, trained to produce **colloquial Tamil translations** instead of formal Tamil.  

✅ Translates **English → Colloquial Tamil**  
✅ Incorporates **slang, informal speech, and real-world phrasing**  
✅ Useful for **chatbots, conversational AI, and social media applications**  

---

## 📜 Dataset  
🔹 **Custom Dataset Used for Fine-Tuning:**  
📂 **[janisrebekahv/colloquial_tamil](https://huggingface.co/datasets/janisrebekahv/colloquial_tamil)**  
This dataset was specifically curated to train this model, improving its ability to translate **English to Colloquial Tamil** accurately.  
This model was fine-tuned on a **custom dataset**, which includes:  

1️⃣ **[jarvisvasu/english-to-colloquial-tamil](https://huggingface.co/datasets/jarvisvasu/english-to-colloquial-tamil)** – A publicly available dataset for informal Tamil translations.  
2️⃣ **YouTube Comments Dataset (Custom-Created)** – Extracted using the **YouTube API** and manually converted to colloquial Tamil for authenticity.  
3️⃣ **ChatGPT-Generated Data** – Additional colloquial Tamil phrases aligned with natural speech patterns.  

📝 **Total dataset size**: **16,269 sentence pairs**  

---

## 🔥 Example Usage  

Load and test the model using **Hugging Face Transformers**:  

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load model and tokenizer
model_name = "janisrebekahv/finetuned-colloquial-tamil"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Function to translate text
def translate(text):
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=128)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example translations
test_sentences = [
    "This is so beautiful",
    "Bro, are you coming or not?",
    "My mom is gonna kill me if I don't reach home now!"
]

for sentence in test_sentences:
    print(f"English: {sentence}")
    print(f"Colloquial Tamil: {translate(sentence)}\n")