dtakizawa's picture
update README.md
469c7ec
metadata
extra_gated_prompt: Please read LICENSE.md before downloading this model.
extra_gated_fields:
  Country: country
  Affiliation: text
  I agree ALL the statements in LICENSE md: checkbox
extra_gated_button_content: Acknowledge license
license: other
license_name: imprt-license
license_link: LICENSE.md
language:
  - ja
pipeline_tag: feature-extraction
tags:
  - wav2vec2
  - speech

imprt/izanami-wav2vec2-base

This is a Japanese wav2vec2.0 Base model pre-trained using 5313 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection.
This model was trained using code from the official repository.

Usage

import soundfile as sf
from transformers import AutoFeatureExtractor
model = "imprt/izanami-wav2vec2-base"
feature_extractor = AutoFeatureExtractor.from_pretrained(model)
audio_file="/path/to/16k_audio_file"
audio_input, sr = sf.read(audio_file)
feature_extractor(audio_input, sampling_rate=sr)

References

@inproceedings{NEURIPS2020_92d1e1eb,
    author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
    pages = {12449--12460},
    publisher = {Curran Associates, Inc.},
    title = {wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
    url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf},
    volume = {33},
    year = {2020}
}

License / Terms

Read LICENSE when you use this model.