YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card: aniemore-audio-finetuned

Model Summary

Aniemore/wavlm-emotion-russian-resd fine-tuned on a balanced Russian-language dataset of emotional speech for the task of 7-class emotion classification. Fine-tuning performed in the context of the EchoStressAI project.


Model Details

  • Model type: WavLMForSequenceClassification
  • Pretrained base: Aniemore/wavlm-emotion-russian-resd
  • Fine-tuned dataset: Balanced custom dataset (audio), 2248 samples per class
  • Languages: Russian
  • Task: Speech emotion recognition (SER)

Label Mapping

ID Label
0 Angry
1 Disgusted
2 Happy
3 Neutral
4 Sad
5 Scared
6 Surprised

Training Details

  • Epochs: 5
  • Batch size: 8
  • Learning rate: 2e-5
  • Optimizer: AdamW
  • Scheduler: Linear
  • Warmup steps: 500
  • Loss function: CrossEntropyLoss
  • FP16 training: Enabled

Evaluation Results (Test Set, 2361 samples)

Metric Value
Accuracy 0.8196
F1-score (macro avg) 0.8185
Cohen's Kappa 0.7895
Matthews Corr. Coef 0.7899

Most confusion observed between Neutral โ†” Sad and Scared โ†” Disgusted. Class Surprised achieved nearly perfect separation (F1 = 0.9955).


Intended Use

  • Target: Russian-language emotional speech from operators, isolated environments, or dialogue systems
  • Use cases:
    • Mental state monitoring
    • Human-robot interaction
    • Emotion-aware assistants

Limitations

  • Domain-specific (mostly clear speech, research-quality recordings)
  • Accuracy on noisy, spontaneous speech may vary
  • Designed for 7 emotions only

Citation

To be added after formal publication of EchoStressAI research.


Contact

https://huggingface.co/nikatonika

Downloads last month
28
Safetensors
Model size
317M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support