SPTK-2

SPTK-2 is an open multilingual automatic speech recognition (ASR) model developed by SVECTOR.
It supports (after revised) 96 languages and offers improved accuracy, timestamp precision, and energy efficiency compared to previous models.

📄 Read the paper: SPTK: A Framework for Universal Multilingual ASR (2025)


🧪 Example Usage

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torchaudio

processor = AutoProcessor.from_pretrained("SVECTOR-CORPORATION/SPTK-2")
model = AutoModelForSpeechSeq2Seq.from_pretrained("SVECTOR-CORPORATION/SPTK-2")

# Load and preprocess audio
audio, sr = torchaudio.load("your_audio_file.mp3")
inputs = processor(audio[0], sampling_rate=sr, return_tensors="pt")

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_values)

# Decode output
print(processor.batch_decode(predicted_ids, skip_special_tokens=True))

📦 Model Details

  • Model type: Encoder-decoder
  • Architecture: E-Branchformer + Sparse MoE decoder
  • Languages: 99+
  • Supports transcription, translation, timestamps
  • Released: April 2025

📜 License

This model is licensed under the SVECTOR Proprietary License.
For research or commercial use, please contact [email protected].


🔗 Related

Downloads last month
26
Safetensors
Model size
809M params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support