wav2vec-S-Large-ft-960h

wav2vec-S is a streaming variant of wav2vec 2.0, designed to maintain consistent representations between training and inference for streaming speech processing.

wav2vec-S-Large-ft-960h was continually pre-trained on the Libri-Light 60kh dataset, and then fine-tuned on the LibriSpeech 960h ASR dataset.

See our paper for details and check out the GitHub repository for usage instructions.