Streaming?
Thank you NVIDIA team for releasing yet another excellent ASR model!
Is there a guide on how to achieve streaming transcription using the latest parakeet-tdt-0.6b-v2 model?
You could do chunked streaming by following this script: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py directions on how to use is inside the script.
We noticed a bug with tdt for chunked streaming inference, we will push it soon to main for everyone to try!
We do also have dedicated cache-aware architecture for streaming use cases: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_en_fastconformer_hybrid_large_streaming_multi . We are also working on an upgraded performant model to this one.
Hi @nithinraok . Thanks for that link. Waiting eagerly for the new streaming models! About the bug - do you recommend waiting for the bugfix if it's major or can the version on main be used already?