YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Model Card: aniemore-audio-finetuned
Model Summary
Aniemore/wavlm-emotion-russian-resd
fine-tuned on a balanced Russian-language dataset of emotional speech for the task of 7-class emotion classification. Fine-tuning performed in the context of the EchoStressAI project.
Model Details
- Model type:
WavLMForSequenceClassification
- Pretrained base:
Aniemore/wavlm-emotion-russian-resd
- Fine-tuned dataset: Balanced custom dataset (audio), 2248 samples per class
- Languages: Russian
- Task: Speech emotion recognition (SER)
Label Mapping
ID | Label |
---|---|
0 | Angry |
1 | Disgusted |
2 | Happy |
3 | Neutral |
4 | Sad |
5 | Scared |
6 | Surprised |
Training Details
- Epochs: 5
- Batch size: 8
- Learning rate: 2e-5
- Optimizer: AdamW
- Scheduler: Linear
- Warmup steps: 500
- Loss function: CrossEntropyLoss
- FP16 training: Enabled
Evaluation Results (Test Set, 2361 samples)
Metric | Value |
---|---|
Accuracy | 0.8196 |
F1-score (macro avg) | 0.8185 |
Cohen's Kappa | 0.7895 |
Matthews Corr. Coef | 0.7899 |
Most confusion observed between Neutral
โ Sad
and Scared
โ Disgusted
. Class Surprised
achieved nearly perfect separation (F1 = 0.9955).
Intended Use
- Target: Russian-language emotional speech from operators, isolated environments, or dialogue systems
- Use cases:
- Mental state monitoring
- Human-robot interaction
- Emotion-aware assistants
Limitations
- Domain-specific (mostly clear speech, research-quality recordings)
- Accuracy on noisy, spontaneous speech may vary
- Designed for 7 emotions only
Citation
To be added after formal publication of EchoStressAI research.
Contact
- Downloads last month
- 28
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support