WhissleAI
/

Meta_STT_PA_AI4Bharat

speech_recognition

dialect_prediction

Model card Files Files and versions Community

Meta_STT_PA_AI4Bharat / README.md

ksingla025's picture

Update README.md

3b83f58 verified 7 months ago

|

history blame contribute delete

1.3 kB

	---
	license: cc-by-4.0
	language:
	- pa
	base_model:
	- parthiv11/stt_hi_conformer_ctc_large_v2
	tags:
	- speech_recognition
	- entity_tagging
	- dialect_prediction
	- gender
	- age
	- intent
	---
	# This speech tagger performs transcription for Punjabi, annotates key entities, predict speaker age, dialiect and intent.

	Model is suitable for voiceAI applications, real-time and offline.

	## Model Details

	- Model type: NeMo ASR
	- Architecture: Conformer CTC
	- Language: Punjabi
	- Training data: AI4Bharat IndicVoices Punjabi V1 and V2 dataset
	- Performance metrics: [Metrics]

	## Usage

	To use this model, you need to install the NeMo library:

	```bash
	pip install nemo_toolkit
	```

	### How to run

	```python
	import nemo.collections.asr as nemo_asr

	# Step 1: Load the ASR model from Hugging Face
	model_name = 'WhissleAI/stt_pa_conformer_ctc_entities_age_dialiect_intent'
	asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name)

	# Step 2: Provide the path to your audio file
	audio_file_path = '/path/to/your/audio_file.wav'

	# Step 3: Transcribe the audio
	transcription = asr_model.transcribe(paths2audio_files=[audio_file_path])
	print(f'Transcription: {transcription[0]}')
	```

	Dataset is from AI4Bharat IndicVoices Hindi V1 and V2 dataset.

	https://indicvoices.ai4bharat.org/