Model Card: aniemore-audio-finetuned

Model Summary

Aniemore/wavlm-emotion-russian-resd fine-tuned on a balanced Russian-language dataset of emotional speech for the task of 7-class emotion classification. Fine-tuning performed in the context of the EchoStressAI project.

Model Details

Model type: WavLMForSequenceClassification
Pretrained base: Aniemore/wavlm-emotion-russian-resd
Fine-tuned dataset: Balanced custom dataset (audio), 2248 samples per class
Languages: Russian
Task: Speech emotion recognition (SER)

Label Mapping

ID	Label
0	Angry
1	Disgusted
2	Happy
3	Neutral
4	Sad
5	Scared
6	Surprised

Training Details

Epochs: 5
Batch size: 8
Learning rate: 2e-5
Optimizer: AdamW
Scheduler: Linear
Warmup steps: 500
Loss function: CrossEntropyLoss
FP16 training: Enabled

Evaluation Results (Test Set, 2361 samples)

Metric	Value
Accuracy	0.8196
F1-score (macro avg)	0.8185
Cohen's Kappa	0.7895
Matthews Corr. Coef	0.7899

Most confusion observed between Neutral ↔ Sad and Scared ↔ Disgusted. Class Surprised achieved nearly perfect separation (F1 = 0.9955).

Intended Use

Target: Russian-language emotional speech from operators, isolated environments, or dialogue systems
Use cases:
- Mental state monitoring
- Human-robot interaction
- Emotion-aware assistants

Limitations

Domain-specific (mostly clear speech, research-quality recordings)
Accuracy on noisy, spontaneous speech may vary
Designed for 7 emotions only

Citation

To be added after formal publication of EchoStressAI research.

Contact

https://huggingface.co/nikatonika