Whisper Small Fine-Tuned on Custom Dataset

This model is a fine-tuned version of OpenAI's whisper-small, optimized for transcribing English speech from a custom dataset.

πŸ› οΈ Model Details

  • Base Model: openai/whisper-small
  • Fine-tuned by: Winardi (Research by Ms. Tong Rong)
  • Language: English (monolingual)
  • Framework: PyTorch, Hugging Face Transformers

πŸ“š Training Data

The model was fine-tuned on a proprietary/custom audio dataset using metadata(clean1).csv. Corrupted or low-quality audio files were excluded. The data was split as follows:

  • Training: 80%
  • Validation: 10%
  • Testing: 10% (used only for evaluation, not during training)

🎯 Intended Use

This model is intended for automatic speech recognition (ASR) in English, especially for environments similar to the training dataset (e.g., single-speaker, clean audio).

πŸ“‰ Performance

  • Metric: Word Error Rate (WER)
  • WER: 1.50%
  • WER with Limited Vocabulary: 1.28%

🚫 Limitations

  • Not robust to heavy background noise or overlapping speech
  • May not perform well on dialects or accents not represented in training data

πŸ’¬ How to Use

You can load and use the fine-tuned model with the πŸ€— Transformers pipeline:

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="Pengwin30/whisper-small-fine-tuned")
result = asr("path/to/audio.wav")
print(result["text"])


## πŸ“œ License

This model is licensed under the **MIT License**.

## πŸ™ Citation

If you use this model in your work, please cite:

@misc{Pengwin30/whisper-small-fine-tuned, author = {Tong Rong, Winardi}, title = {Whisper Small Fine-Tuned on Custom Dataset}, year = {2025}, url = {https://huggingface.co/Pengwin30/whisper-small-fine-tuned} } ```

Downloads last month
18
Safetensors
Model size
242M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Pengwin30/whisper-small-fine-tuned

Finetuned
(2570)
this model

Evaluation results

  • Word Error Rate on Custom Audio Dataset
    self-reported
    1.50%
  • Word Error Rate With Limited Vocabulary on Custom Audio Dataset
    self-reported
    1.28%