--- license: cc-by-nc-sa-4.0 language: - fr metrics: - wer model-index: - name: asr-wav2vec2-LB7K-spontaneous-fr results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: ETAPE type: ETAPE split: test args: language: fr metrics: - name: Test WER type: wer value: '27.81' - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: CV 6.1 type: CommonVoice split: test args: language: fr metrics: - name: Test WER type: wer value: '21.69' - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: AllSpont type: AllSpont split: test args: language: fr metrics: - name: Test WER type: wer value: '26.80' - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Unusual_distant ("peu spontané") type: Unusual_distant split: test args: language: fr metrics: - name: Test WER type: wer value: '13.44' - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Unusual_close ("moyennement spontané") type: Unusual_close split: test args: language: fr metrics: - name: Test WER type: wer value: '23.36' - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Usual_close ("très spontané") type: Usual_close split: test args: language: fr metrics: - name: Test WER type: wer value: '51.97' - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: AllCases type: AllCases split: test args: language: fr metrics: - name: Test WER type: wer value: '29.41' library_name: speechbrain pipeline_tag: automatic-speech-recognition tags: - CTC - pytorch - asr - speechbrain - spontaneous speech --- ### Wav2Vec 2.0 avec CTC adapté sur de la parole spontanée en français - Système développé dans le cadre des travaux de thèse de Solène Evain: https://theses.fr/2024GRALM037 - Date: Janvier 2024 - Type de modèle: Wav2Vec 2.0 + CTC pour reconnaissance automatique de la parole La recette d'entraînement de ce système a été suivie: https://huggingface.co/speechbrain/asr-wav2vec2-commonvoice-fr Modèle Wav2Vec 2.0 : LeBenchmark 7k large https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large Les scripts sont à retrouver sur le repo Gitlab de la thèse: https://gitlab.com/solene-evain/recops/Domain_adaptations/7k_domainAdaptation/ - Speechbrain version: 0.5.11 - Licence: CC BY NC SA 4.0 ### Données d'apprentissage, de dev et de test AllSpont: Les données dites "AllSpont" sont réparties en trois ensembles train, dev et test. Train: 268h55 de parole spontanée (heures effectives de parole) Dev: 34h06 Test: 34h06 Les données sont issues (partiellement ou en totalité) des corpus suivants: Français de | Corpus | Heures | |---|---|---| |France |TCOF |23h| |France |ESLO2 |41h22| |France|CLAPI |2h05| |France|CFPP |35h47| |France|C-ORAL-ROM |16h05| |France|REUNIONS |9h36| |France|CID |6h32| |France|TUFS |27h15| |France|CRFP |25h57| |France|PFC |13h14| |France|FLEURON |2h18| |N/A |PFC |10h12| |N/A |TCOF |2h33| |N/A |MPF |17h50| |Suisse|OFROM |18h16| |Suisse|PFC |5h13| |Belgique |CFPB|7h39| Pour tout besoin de détail sur les fichiers wav inclus dans le train, voir rubrique "contact". ### Données d'évaluation - Usual_close: 1h28 de parole effective (ESLO2: 0h12, CLAPI: 1h15) ("très spontané") - Unusual_close: 1h25 de parole effective (CFPB: 1h25) ("moyennement spontané") - Unusual_distant: 1h40 de parole effective (CRFP: 0h47, ESLO2: 0h52) ("peu spontané") - AllCases: Usual_close + Unusual_colse + Unusual_distant - AllSpont test: (voir section "Données d'apprentissage, de dev et de test AllSpont") - ETAPE: https://aclanthology.org/L12-1270/ - CV: CommonVoice version 6.1 ### Comparaison avec d'autres systèmes: |Système | Usual_close | Unusual_close | Unusual_distant | AllCases | AllSpont | ETAPE | CV 6.1| |---|---|---|---|---|---|---|---| |Whisper large-v2|51.97|23.36|13.44|29.41|26.80|27.81|21.69| |asr-wav2vec2-commonvoice-fr (CV 6.1, LeBenchmark-7k-large)| 80.85|52.66|32.16|55.14|51.2|36.55|9.97| ### Décoder ses propres enregistrements: (Test effectué avec Speechbrain 1.0.2) ``` pip install speechbrain transformers ``` ```python from speechbrain.inference.ASR import EncoderASR asr_model = EncoderASR.from_hparams(source="Sevain/asr-wav2vec2-LB7K-spontaneous-fr", savedir="pretrained_models/asr-wav2vec2-LB7K-spontaneous-fr") asr_model.transcribe_file('path/to/your/file') ``` ### Citer les travaux: ``` @misc{SB2021, author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua }, title = {SpeechBrain}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {https://github.com/speechbrain/speechbrain}, } @phdthesis{evain:tel-04984659, TITLE = {{Dimensions de variation de la parole spontan{\'e}e pour l'{\'e}tude inter-corpus des performances de syst{\`e}mes de reconnaissance automatique de la parole}}, AUTHOR = {Evain, Sol{\`e}ne}, URL = {https://theses.hal.science/tel-04984659}, NUMBER = {2024GRALM037}, SCHOOL = {{Universit{\'e} Grenoble Alpes}}, YEAR = {2024}, MONTH = Oct, KEYWORDS = {Automatic speech recognition ; Spontaneous speech ; Deep learning ; Reconnaissance automatique de la parole ; Parole spontan{\'e}e ; Apprentissage profond}, TYPE = {Theses}, PDF = {https://theses.hal.science/tel-04984659v1/file/EVAIN_2024_archivage.pdf}, HAL_ID = {tel-04984659}, HAL_VERSION = {v1}, } ``` ### Contact: Solène Evain (solene.evain@inria.fr) ### Caveats and recommendations We do not provide any warranty on the performance achieved by this model when used on other datasets