GogetaBlueMUI commited on
Commit
956f700
·
verified ·
1 Parent(s): 00ac1c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -123
README.md CHANGED
@@ -1,147 +1,66 @@
1
- Here’s the improved **README.md** with proper YAML metadata and a table for epochs, loss, and validation loss.
2
-
3
- ---
4
-
5
- ```yaml
6
  ---
7
  library_name: transformers
8
  language:
9
- - ur
10
  license: apache-2.0
11
  base_model: openai/whisper-medium
12
  tags:
13
- - automatic-speech-recognition
14
- - urdu
15
- - whisper
16
- - fine-tuned
17
  datasets:
18
- - fsicoli/common_voice_19_0
19
  model-index:
20
- - name: whisper-medium-ur-v2
21
- results:
22
- - task:
23
- type: automatic-speech-recognition
24
- dataset:
25
- name: Common Voice 19.0
26
- type: fsicoli/common_voice_19_0
27
- metrics:
28
- - name: Validation Loss
29
- type: loss
30
- value: 0.3571
31
- - name: Word Error Rate (WER)
32
- type: wer
33
- value: 25.17
34
  ---
35
- ```
36
 
37
- # 🚀 Whisper Medium Urdu (whisper-medium-ur-v2)
 
38
 
39
- Fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) for **Urdu Automatic Speech Recognition (ASR)** on the **Common Voice 19.0** dataset.
40
 
41
- ---
 
 
 
 
 
 
 
 
42
 
43
- ## 📊 Training Performance
44
-
45
- | Epoch | Training Loss | Validation Loss | WER (%) |
46
- |-------|--------------|----------------|---------|
47
- | 0.5 | 0.4503 | 0.4121 | 28.45 |
48
- | 1.0 | 0.2304 | 0.3582 | 25.29 |
49
- | 1.31 | 0.1733 | 0.3571 | 25.17 |
50
-
51
- ---
52
 
53
- ## 📌 Model Description
54
- This model is based on **Whisper Medium**, a transformer-based sequence-to-sequence ASR model trained by OpenAI. It has been **fine-tuned on Urdu speech data** to improve transcription accuracy.
55
 
56
- - **Base Model:** [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)
57
- - **Language:** Urdu (ur)
58
- - **Dataset:** [Common Voice 19.0](https://huggingface.co/datasets/fsicoli/common_voice_19_0)
59
-
60
- ---
61
-
62
- ## 🚀 Intended Use & Limitations
63
- ✅ **Best suited for:**
64
- - Urdu speech-to-text transcription
65
- - Conversational & broadcast speech
66
-
67
- ⚠️ **Limitations:**
68
- - May struggle with **noisy environments**
69
- - Accuracy depends on **audio quality and speaker accents**
70
- - Not tested for **code-switching (mixing Urdu with English)**
71
-
72
- ---
73
 
74
- ## 🔧 How to Use
75
- You can use this model with 🤗 Transformers:
76
 
77
- ```python
78
- from transformers import pipeline
79
 
80
- pipe = pipeline("automatic-speech-recognition", model="GogetaBlueMUI/whisper-medium-ur-v2")
81
- result = pipe("path_to_audio.wav")
82
- print(result["text"])
83
- ```
84
 
85
- For lower-level inference:
86
 
87
- ```python
88
- from transformers import WhisperProcessor, WhisperForConditionalGeneration
89
- import torch
90
- import torchaudio
91
 
92
- # Load model & processor
93
- processor = WhisperProcessor.from_pretrained("GogetaBlueMUI/whisper-medium-ur-v2")
94
- model = WhisperForConditionalGeneration.from_pretrained("GogetaBlueMUI/whisper-medium-ur-v2")
 
 
 
 
 
 
 
 
 
95
 
96
- # Load audio file
97
- waveform, sample_rate = torchaudio.load("path_to_audio.wav")
98
- inputs = processor(waveform, sampling_rate=sample_rate, return_tensors="pt")
99
-
100
- # Generate transcription
101
- with torch.no_grad():
102
- predicted_ids = model.generate(inputs.input_features)
103
- transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
104
-
105
- print(transcription)
106
- ```
107
-
108
- ---
109
-
110
- ## 🛠 Training Details
111
- ### **Hyperparameters**
112
- - **Learning Rate:** 3e-6
113
- - **Batch Size:** 8 (per device)
114
- - **Gradient Accumulation Steps:** 2
115
- - **Optimizer:** AdamW
116
- - **Scheduler:** Linear Warmup (100 steps)
117
- - **Precision:** Mixed Precision (AMP)
118
-
119
- ### **Framework Versions**
120
- - **Transformers:** 4.49.0
121
- - **PyTorch:** 2.5.1+cu121
122
- - **Datasets:** 3.4.1
123
- - **Tokenizers:** 0.21.0
124
-
125
- ---
126
-
127
- ## 📚 Citation
128
- If you use this model in your research or project, please cite:
129
-
130
- ```bibtex
131
- @misc{whisper-medium-ur-v2,
132
- author = {GogetaBlueMUI},
133
- title = {Whisper Medium Urdu Fine-Tuned Model},
134
- year = {2025},
135
- publisher = {Hugging Face},
136
- url = {https://huggingface.co/GogetaBlueMUI/whisper-medium-ur-v2}
137
- }
138
- ```
139
-
140
- ---
141
 
142
- ## Acknowledgements
143
- - **OpenAI** for the Whisper model
144
- - **Mozilla Common Voice** for the dataset
145
- - **Hugging Face** for hosting
146
 
147
- ---
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
  language:
4
+ - ur
5
  license: apache-2.0
6
  base_model: openai/whisper-medium
7
  tags:
8
+ - generated_from_trainer
 
 
 
9
  datasets:
10
+ - fsicoli/common_voice_19_0
11
  model-index:
12
+ - name: Whisper Medium Ur - Your Name
13
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
 
15
 
16
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
+ should probably proofread and complete it, then remove this comment. -->
18
 
19
+ # Whisper Medium Ur - Your Name
20
 
21
+ This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the Common Voice 19.0 dataset.
22
+ It achieves the following results on the evaluation set:
23
+ - eval_loss: 0.3571
24
+ - eval_wer: 25.1658
25
+ - eval_runtime: 4297.3715
26
+ - eval_samples_per_second: 1.167
27
+ - eval_steps_per_second: 0.146
28
+ - epoch: 1.3108
29
+ - step: 1000
30
 
31
+ ## Model description
 
 
 
 
 
 
 
 
32
 
33
+ More information needed
 
34
 
35
+ ## Intended uses & limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
+ More information needed
 
38
 
39
+ ## Training and evaluation data
 
40
 
41
+ More information needed
 
 
 
42
 
43
+ ## Training procedure
44
 
45
+ ### Training hyperparameters
 
 
 
46
 
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 3e-06
49
+ - train_batch_size: 8
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 16
54
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
55
+ - lr_scheduler_type: linear
56
+ - lr_scheduler_warmup_steps: 100
57
+ - training_steps: 1000
58
+ - mixed_precision_training: Native AMP
59
 
60
+ ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
+ - Transformers 4.49.0
63
+ - Pytorch 2.5.1+cu121
64
+ - Datasets 3.4.1
65
+ - Tokenizers 0.21.0
66