metadata
extra_gated_prompt: Please read LICENSE.md before downloading this model.
extra_gated_fields:
Country: country
Affiliation: text
I agree ALL the statements in LICENSE md: checkbox
extra_gated_button_content: Acknowledge license
license: other
license_name: imprt-license
license_link: LICENSE.md
language:
- ja
pipeline_tag: feature-extraction
tags:
- wav2vec2
- speech
imprt/izanami-wav2vec2-base
This is a Japanese wav2vec2.0 Base model pre-trained using 5313 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection.
This model was trained using code from the official repository.
Usage
import soundfile as sf
from transformers import AutoFeatureExtractor
model = "imprt/izanami-wav2vec2-base"
feature_extractor = AutoFeatureExtractor.from_pretrained(model)
audio_file="/path/to/16k_audio_file"
audio_input, sr = sf.read(audio_file)
feature_extractor(audio_input, sampling_rate=sr)
References
@inproceedings{NEURIPS2020_92d1e1eb,
author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
pages = {12449--12460},
publisher = {Curran Associates, Inc.},
title = {wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf},
volume = {33},
year = {2020}
}
License / Terms
Read LICENSE when you use this model.