Whisper Small Fine-Tuned on Custom Dataset
This model is a fine-tuned version of OpenAI's whisper-small
, optimized for transcribing English speech from a custom dataset.
π οΈ Model Details
- Base Model: openai/whisper-small
- Fine-tuned by: Winardi (Research by Ms. Tong Rong)
- Language: English (monolingual)
- Framework: PyTorch, Hugging Face Transformers
π Training Data
The model was fine-tuned on a proprietary/custom audio dataset using metadata(clean1).csv
. Corrupted or low-quality audio files were excluded. The data was split as follows:
- Training: 80%
- Validation: 10%
- Testing: 10% (used only for evaluation, not during training)
π― Intended Use
This model is intended for automatic speech recognition (ASR) in English, especially for environments similar to the training dataset (e.g., single-speaker, clean audio).
π Performance
- Metric: Word Error Rate (WER)
- WER:
1.50%
- WER with Limited Vocabulary:
1.28%
π« Limitations
- Not robust to heavy background noise or overlapping speech
- May not perform well on dialects or accents not represented in training data
π¬ How to Use
You can load and use the fine-tuned model with the π€ Transformers pipeline:
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="Pengwin30/whisper-small-fine-tuned")
result = asr("path/to/audio.wav")
print(result["text"])
## π License
This model is licensed under the **MIT License**.
## π Citation
If you use this model in your work, please cite:
@misc{Pengwin30/whisper-small-fine-tuned, author = {Tong Rong, Winardi}, title = {Whisper Small Fine-Tuned on Custom Dataset}, year = {2025}, url = {https://huggingface.co/Pengwin30/whisper-small-fine-tuned} } ```
- Downloads last month
- 18
Model tree for Pengwin30/whisper-small-fine-tuned
Base model
openai/whisper-smallEvaluation results
- Word Error Rate on Custom Audio Datasetself-reported1.50%
- Word Error Rate With Limited Vocabulary on Custom Audio Datasetself-reported1.28%