Whisper Small Fine-Tuned on Custom Dataset

This model is a fine-tuned version of OpenAI's whisper-small, optimized for transcribing English speech from a custom dataset.

🛠️ Model Details

Base Model: openai/whisper-small
Fine-tuned by: Winardi (Research by Ms. Tong Rong)
Language: English (monolingual)
Framework: PyTorch, Hugging Face Transformers

📚 Training Data

The model was fine-tuned on a proprietary/custom audio dataset using metadata(clean1).csv. Corrupted or low-quality audio files were excluded. The data was split as follows:

Training: 80%
Validation: 10%
Testing: 10% (used only for evaluation, not during training)

🎯 Intended Use

This model is intended for automatic speech recognition (ASR) in English, especially for environments similar to the training dataset (e.g., single-speaker, clean audio).

📉 Performance

Metric: Word Error Rate (WER)
WER: 1.50%
WER with Limited Vocabulary: 1.28%

🚫 Limitations

Not robust to heavy background noise or overlapping speech
May not perform well on dialects or accents not represented in training data

💬 How to Use

You can load and use the fine-tuned model with the 🤗 Transformers pipeline:

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="Pengwin30/whisper-small-fine-tuned")
result = asr("path/to/audio.wav")
print(result["text"])


## 📜 License

This model is licensed under the **MIT License**.

## 🙏 Citation

If you use this model in your work, please cite:

@misc{Pengwin30/whisper-small-fine-tuned, author = {Tong Rong, Winardi}, title = {Whisper Small Fine-Tuned on Custom Dataset}, year = {2025}, url = {https://huggingface.co/Pengwin30/whisper-small-fine-tuned} } ```

Pengwin30
/

whisper-small-fine-tuned