File size: 4,766 Bytes
acc977f
 
 
 
 
b689892
 
 
 
206c09b
 
 
fa41fb9
206c09b
 
 
 
 
 
 
 
 
 
 
 
 
 
3914ea9
206c09b
 
3914ea9
b689892
206c09b
acc977f
f6a0149
b689892
acc977f
b689892
9f93c95
b689892
 
9f93c95
 
 
 
 
 
 
 
 
 
53b7b63
b689892
 
 
9f93c95
b689892
 
 
 
 
60874cd
b689892
 
9f93c95
 
 
 
 
 
 
 
 
 
 
4bdda3d
9f93c95
 
 
 
 
 
 
5542863
9f93c95
 
 
 
b689892
 
 
 
9f93c95
b689892
 
 
 
 
 
 
 
 
 
 
 
 
 
9f93c95
 
 
 
 
 
 
 
 
b689892
9f93c95
 
53b7b63
b689892
 
 
 
9f93c95
b689892
 
53b7b63
 
9f93c95
b689892
 
 
 
2aed4ec
53b7b63
 
d87abb9
ad11e00
53b7b63
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
library_name: peft
base_model: openai/whisper-large-v2
tags:
- generated_from_trainer
- multilingual
- ASR
- Open-Source
language:
- wo
- fr
- en
model-index:
- name: whosper-large
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Test Set
      type: custom
      split: test
      args:
        language: wo
    metrics:
    - name: Test WER
      type: wer
      value: 24.23
    - name: Test CER
      type: cer
      value: 11.35
pipeline_tag: automatic-speech-recognition
new_version: sudoping01/whosper-large-v3
---

# Whosper-large

## Model Overview
Whosper-large is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) optimized for Wolof speech recognition Senegal's primary language, while maintaining strong multilingual capabilities. Built on OpenAI's Whisper-large-v2, it advances African language processing with notable improvements in Word Error Rate (WER) and Character Error Rate (CER). Whether you're transcribing conversations, building language learning tools, or conducting research, this model is designed for researchers, developers, and students working with Wolof speech data.


### Key Strengths
- **Strong Multilingual**: Excellent performance in Wolof, French, and English
- **Code-Switching**: Handles natural language mixing, especially Wolof-French
- **Consistent Results**: Maintains quality across different languages
- **Open Source**: Released under the [apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) license
- **African NLP**: Supporting African language technology development

## Performance Metrics
- **WER**: 0.2423 
- **CER**: 0.1135 


## Key Features
- Strong multilingual performance (Wolof, French, English)
- Excellent performance on code-switched content
- Consistent performance across different languages

## Limitations
- Outputs in lowercase only
- Limited punctuation support
- Low performances on bad quality audios

## Training Data
Trained on diverse Wolof speech data:

- **ALFFA Public Dataset**
- **FLEURS Dataset**
- **Bus Urbain Dataset**
- **Kallama Dataset**

## Quick Start Guide

### Installation
```bash
pip install git+https://github.com/sudoping01/whosper.git
```

### Basic Usage
```python
from whosper import WhosperTranscriber

# Initialize the transcriber
transcriber = WhosperTranscriber(model_id="CAYTU/whosper-large") 

# Transcribe an audio file
result = transcriber.transcribe_audio("path/to/your/audio.wav")
print(result)
```

### Training Results
| Training Loss | Epoch | Step | Validation Loss |
|---------------|-------|------|-----------------|
| 3.0514 | 1.0 | 1732 | 0.6824 |
| 2.2658 | 2.0 | 3464 | 0.5998 |
| 2.0274 | 3.0 | 5196 | 0.5282 |
| 1.48 | 4.0 | 6928 | 0.4793 |
| 1.1693 | 5.0 | 8660 | 0.4441 |
| 0.8762 | 5.9970 | 10386 | 0.4371 |

## Framework Versions
- PEFT: 0.14.1.dev0
- Transformers: 4.48.0.dev0
- PyTorch: 2.5.1+cu124
- Datasets: 3.2.0
- Tokenizers: 0.21.0

## Contributing to African NLP
Whosper-large embodies our commitment to open science and the advancement of African language technologies. We believe that by making cutting-edge speech recognition models freely available, we can accelerate NLP development across Africa.

Join our mission to democratize AI technology:
- **Open Science**: Use and build upon our research - all code, models, and documentation are open source
- **Research Collaboration**: Integrate Whosper into your research projects and share your findings
- **Community Building**: Help us create resources for African language processing
- **Educational Impact**: Use Whosper in educational settings to train the next generation of African AI researchers

## License
[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)

This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) to encourage research, commercial use, and innovation in African language technologies while ensuring proper attribution and patent protection.

## Citation
```bibtex
@misc{whosper2025,
  title={Whosper-large: A Multilingual ASR Model for Wolof with Enhanced Code-Switching Capabilities},
  author={Seydou DIALLO},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/CAYTU/whosper-large},
  version={1.0}
}
```

## Acknowledgments
Developed by [Seydou DIALLO](https://www.linkedin.com/in/seydou-diallo-08ab311ba) at [Caytu Robotics](https://caytu.ai)'s AI Department, building on OpenAI's [Whisper-large-v2](https://huggingface.co/openai/whisper-large-v2). Special thanks to the Wolof-speaking community and contributors advancing African language technology.

## Contact US
For any question or support contact us

Email : [email protected]