metadata

license: unknown
language:
  - en
metrics:
  - wer
tags:
  - whisper
  - speech processing
  - nlp
  - asr
  - domain adaptation

Whispered TIA

Whispered TIA is a fine-tuned ASR model based on Whisper. It is adapted to TIA (Totally Integrated Automation) from Siemens AG and is able to predict domain specific words and to transcribe them correctly.

Base Model Whisper

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. The original code repository can be found here.

Training Results

The False HallucER indicates how many hallucinations and deletions were produced.

WER	False HallucER	Runtime	Batch Size	Memory Usage
1.68	248.59	1.75	32	20407
~	~	Predictions > References: 32%	~	~	~
~	~	Predictions < References: 34%	~	~	~
~	~	Predictions = References: 34%	~	~	~

Dataset

The underlying dataset is dataset: nosil.

Inference

import librosa
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# Insert audio file
file = "/path/to/audio"

# Convert to Mel Spectrogram
arr, sampling_rate = librosa.load(file, sr=16000)

# Load whisper model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("masters-thesis-vm/whispered_TIA_small_standard_ft_nosil")

# Preprocessing
input_features = processor(arr, return_tensors="pt", sampling_rate=sampling_rate).input_features 

# Prediction
forced_decoder_ids = processor.get_decoder_prompt_ids(language="en", task="transcribe")
predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription)