|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- pa |
|
base_model: |
|
- parthiv11/stt_hi_conformer_ctc_large_v2 |
|
tags: |
|
- speech_recognition |
|
- entity_tagging |
|
- dialect_prediction |
|
- gender |
|
- age |
|
- intent |
|
--- |
|
# This speech tagger performs transcription for Punjabi, annotates key entities, predict speaker age, dialiect and intent. |
|
|
|
Model is suitable for voiceAI applications, real-time and offline. |
|
|
|
## Model Details |
|
|
|
- **Model type**: NeMo ASR |
|
- **Architecture**: Conformer CTC |
|
- **Language**: Punjabi |
|
- **Training data**: AI4Bharat IndicVoices Punjabi V1 and V2 dataset |
|
- **Performance metrics**: [Metrics] |
|
|
|
## Usage |
|
|
|
To use this model, you need to install the NeMo library: |
|
|
|
```bash |
|
pip install nemo_toolkit |
|
``` |
|
|
|
### How to run |
|
|
|
```python |
|
import nemo.collections.asr as nemo_asr |
|
|
|
# Step 1: Load the ASR model from Hugging Face |
|
model_name = 'WhissleAI/stt_pa_conformer_ctc_entities_age_dialiect_intent' |
|
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name) |
|
|
|
# Step 2: Provide the path to your audio file |
|
audio_file_path = '/path/to/your/audio_file.wav' |
|
|
|
# Step 3: Transcribe the audio |
|
transcription = asr_model.transcribe(paths2audio_files=[audio_file_path]) |
|
print(f'Transcription: {transcription[0]}') |
|
``` |
|
|
|
Dataset is from AI4Bharat IndicVoices Hindi V1 and V2 dataset. |
|
|
|
https://indicvoices.ai4bharat.org/ |
|
|
|
|