YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for RawandLaouini/ArabicEchoV1

Model Details

Model Description

This is a fine-tuned version of the openai/whisper-medium model, adapted for Arabic Automatic Speech Recognition (ASR) using LoRA (Low-Rank Adaptation). The model was trained to transcribe Arabic audio from the Common Voice 11.0 dataset, focusing on improving performance for the Arabic language.

  • Developed by: Rawand Laouini
  • Finetuned from model: openai/whisper-medium
  • Model type: Transformer-based ASR model with LoRA
  • Language(s): Arabic
  • License: MIT (or specify your preferred license)
  • Shared by: Rawand Laouini

Uses

Direct Use

This model can be used for transcribing Arabic speech to text, suitable for applications like voice assistants, subtitle generation, or educational tools for Arabic speakers.

Out-of-Scope Use

The model should not be used for real-time transcription without further optimization, nor for languages other than Arabic without retraining.

Bias, Risks, and Limitations

The model was trained on the Common Voice 11.0 Arabic dataset, which may contain biases or limited dialectal coverage. Performance may vary across different Arabic dialects or noisy environments. Users should validate outputs for critical applications.

Recommendations

Users should test the model on their specific use case and consider augmenting the training data for better dialectal or noise robustness.

How to Get Started with the Model

Use the following code to load and use the model:

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel

processor = WhisperProcessor.from_pretrained("openai/whisper-medium")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-medium")
model = PeftModel.from_pretrained(model, "RawandLaouini/ArabicEchoV1")

model.eval()
input_features = processor(audio, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Training Details

Training Data

  • Dataset: mozilla-foundation/common_voice_11_0 (Arabic split)
  • Training split: 90% of train+validation (approx. 25 238 examples)
  • Validation split: Manual evaluation on 200 examples from test split

Training Procedure

Preprocessing

Audio was resampled to 16 kHz to match Whisper's requirements, and input features were extracted using the Whisper processor.

Training Hyperparameters

  • Batch size: 1 (per device)
  • Gradient accumulation steps: 1
  • Learning rate: 1e-4
  • Warmup steps: 100
  • Max steps: 300
  • Optimizer: AdamW
  • Mixed precision: FP16

Speeds, Sizes, Times

  • Training time: ~2.46 minutes for 300 steps
  • Model size: ~18.9 MB (LoRA adapters)

Evaluation

Testing Data

Manual evaluation on 200 examples from the Common Voice test split.

Metrics

  • Word Error Rate (WER): 0.4425
  • Character Error Rate (CER): 0.1187

Results

The model achieves a WER of 44.25% and CER of 11.87% on the manual evaluation set, indicating moderate transcription accuracy for Arabic speech.

Environmental Impact

  • Hardware Type: NVIDIA GPU (14.74 GiB)
  • Hours used: ~0.04 hours (2.46 minutes)
  • Cloud Provider: Local/Colab (unspecified)
  • Compute Region: Unspecified
  • Carbon Emitted: Minimal (estimated < 0.01 kg CO2e using Lacoste et al., 2019)

Citation

BibTeX:

@misc{laouini2025arabicechov1,
  author = {Rawand Laouini},
  title = {ArabicEchoV1: Fine-tuned Whisper-medium for Arabic ASR with LoRA},
  year = {2025},
  howpublished = {\url{https://huggingface.co/RawandLaouini/ArabicEchoV1}}
}

APA:

Laouini, R. (2025). ArabicEchoV1: Fine-tuned Whisper-medium for Arabic ASR with LoRA. Retrieved from https://huggingface.co/RawandLaouini/ArabicEchoV1

Model Card Authors

  • Rawand Laouini

Model Card Contact

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support