Can't match expected performance on VoxCeleb test set
#6
by
607HF
- opened
When evaluating this model on VoxCeleb-O, I get an EER of 4.9% (threshold around .87, which is very close to what's reported here). This seems high, and according to the WavLM paper it should be 0.84%. What EER do you get using this model?
What might be a clue is that when putting the input through the model I get the following warning:torch\nn\functional.py:5962: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead.
I have not been able to figure out what's causing this, though. I have also tried setting up the Unispeech git repository and comparing to the model checkpoints released there, but I have not been able to set the environment up successfully.