README.md · Sevain/asr-wav2vec2-LB7K-spontaneous-fr at main

metadata

license: cc-by-nc-sa-4.0
language:
  - fr
metrics:
  - wer
model-index:
  - name: asr-wav2vec2-LB7K-spontaneous-fr
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: ETAPE
          type: ETAPE
          split: test
          args:
            language: fr
        metrics:
          - name: Test WER
            type: wer
            value: '27.81'
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: CV 6.1
          type: CommonVoice
          split: test
          args:
            language: fr
        metrics:
          - name: Test WER
            type: wer
            value: '21.69'
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: AllSpont
          type: AllSpont
          split: test
          args:
            language: fr
        metrics:
          - name: Test WER
            type: wer
            value: '26.80'
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Unusual_distant ("peu spontané")
          type: Unusual_distant
          split: test
          args:
            language: fr
        metrics:
          - name: Test WER
            type: wer
            value: '13.44'
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Unusual_close ("moyennement spontané")
          type: Unusual_close
          split: test
          args:
            language: fr
        metrics:
          - name: Test WER
            type: wer
            value: '23.36'
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Usual_close ("très spontané")
          type: Usual_close
          split: test
          args:
            language: fr
        metrics:
          - name: Test WER
            type: wer
            value: '51.97'
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: AllCases
          type: AllCases
          split: test
          args:
            language: fr
        metrics:
          - name: Test WER
            type: wer
            value: '29.41'
library_name: speechbrain
pipeline_tag: automatic-speech-recognition
tags:
  - CTC
  - pytorch
  - asr
  - speechbrain
  - spontaneous speech

Wav2Vec 2.0 avec CTC adapté sur de la parole spontanée en français

Système développé dans le cadre des travaux de thèse de Solène Evain: https://theses.fr/2024GRALM037
Date: Janvier 2024
Type de modèle: Wav2Vec 2.0 + CTC pour reconnaissance automatique de la parole

La recette d'entraînement de ce système a été suivie: https://huggingface.co/speechbrain/asr-wav2vec2-commonvoice-fr

Modèle Wav2Vec 2.0 : LeBenchmark 7k large https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large

Les scripts sont à retrouver sur le repo Gitlab de la thèse: https://gitlab.com/solene-evain/recops/Domain_adaptations/7k_domainAdaptation/

Speechbrain version: 0.5.11
Licence: CC BY NC SA 4.0

Données d'apprentissage, de dev et de test AllSpont:

Les données dites "AllSpont" sont réparties en trois ensembles train, dev et test. Train: 268h55 de parole spontanée (heures effectives de parole) Dev: 34h06 Test: 34h06

Les données sont issues (partiellement ou en totalité) des corpus suivants:

Français de	Corpus	Heures
France	TCOF	23h
France	ESLO2	41h22
France	CLAPI	2h05
France	CFPP	35h47
France	C-ORAL-ROM	16h05
France	REUNIONS	9h36
France	CID	6h32
France	TUFS	27h15
France	CRFP	25h57
France	PFC	13h14
France	FLEURON	2h18
N/A	PFC	10h12
N/A	TCOF	2h33
N/A	MPF	17h50
Suisse	OFROM	18h16
Suisse	PFC	5h13
Belgique	CFPB	7h39

Pour tout besoin de détail sur les fichiers wav inclus dans le train, voir rubrique "contact".

Données d'évaluation

Usual_close: 1h28 de parole effective (ESLO2: 0h12, CLAPI: 1h15) ("très spontané")
Unusual_close: 1h25 de parole effective (CFPB: 1h25) ("moyennement spontané")
Unusual_distant: 1h40 de parole effective (CRFP: 0h47, ESLO2: 0h52) ("peu spontané")
AllCases: Usual_close + Unusual_colse + Unusual_distant
AllSpont test: (voir section "Données d'apprentissage, de dev et de test AllSpont")
ETAPE: https://aclanthology.org/L12-1270/
CV: CommonVoice version 6.1

Comparaison avec d'autres systèmes:

Système	Usual_close	Unusual_close	Unusual_distant	AllCases	AllSpont	ETAPE	CV 6.1
Whisper large-v2	51.97	23.36	13.44	29.41	26.80	27.81	21.69
asr-wav2vec2-commonvoice-fr (CV 6.1, LeBenchmark-7k-large)	80.85	52.66	32.16	55.14	51.2	36.55	9.97

Décoder ses propres enregistrements:

(Test effectué avec Speechbrain 1.0.2)

pip install speechbrain transformers

from speechbrain.inference.ASR import EncoderASR

asr_model = EncoderASR.from_hparams(source="Sevain/asr-wav2vec2-LB7K-spontaneous-fr", savedir="pretrained_models/asr-wav2vec2-LB7K-spontaneous-fr")
asr_model.transcribe_file('path/to/your/file')

Citer les travaux:

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {https://github.com/speechbrain/speechbrain},
  }

@phdthesis{evain:tel-04984659,
  TITLE = {{Dimensions de variation de la parole spontan{\'e}e pour l'{\'e}tude inter-corpus des performances de syst{\`e}mes de reconnaissance automatique de la parole}},
  AUTHOR = {Evain, Sol{\`e}ne},
  URL = {https://theses.hal.science/tel-04984659},
  NUMBER = {2024GRALM037},
  SCHOOL = {{Universit{\'e} Grenoble Alpes}},
  YEAR = {2024},
  MONTH = Oct,
  KEYWORDS = {Automatic speech recognition ; Spontaneous speech ; Deep learning ; Reconnaissance automatique de la parole ; Parole spontan{\'e}e ; Apprentissage profond},
  TYPE = {Theses},
  PDF = {https://theses.hal.science/tel-04984659v1/file/EVAIN_2024_archivage.pdf},
  HAL_ID = {tel-04984659},
  HAL_VERSION = {v1},
}

Contact:

Solène Evain ([email protected])

Caveats and recommendations

We do not provide any warranty on the performance achieved by this model when used on other datasets