SPTK-2
SPTK-2 is an open multilingual automatic speech recognition (ASR) model developed by SVECTOR.
It supports (after revised) 96 languages and offers improved accuracy, timestamp precision, and energy efficiency compared to previous models.
📄 Read the paper: SPTK: A Framework for Universal Multilingual ASR (2025)
🧪 Example Usage
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torchaudio
processor = AutoProcessor.from_pretrained("SVECTOR-CORPORATION/SPTK-2")
model = AutoModelForSpeechSeq2Seq.from_pretrained("SVECTOR-CORPORATION/SPTK-2")
# Load and preprocess audio
audio, sr = torchaudio.load("your_audio_file.mp3")
inputs = processor(audio[0], sampling_rate=sr, return_tensors="pt")
# Generate transcription
with torch.no_grad():
predicted_ids = model.generate(inputs.input_values)
# Decode output
print(processor.batch_decode(predicted_ids, skip_special_tokens=True))
📦 Model Details
- Model type: Encoder-decoder
- Architecture: E-Branchformer + Sparse MoE decoder
- Languages: 99+
- Supports transcription, translation, timestamps
- Released: April 2025
📜 License
This model is licensed under the SVECTOR Proprietary License.
For research or commercial use, please contact [email protected].
🔗 Related
- Downloads last month
- 26
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support