arxiv:2505.22759

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

Published on May 28

· Submitted by

spapi on May 30

Upvote

Authors:

Mohamed Nabih ,

Abstract

FAMA, an open science family of speech foundation models, provides transparency and competitive performance by leveraging open-source training data and code.

AI-generated summary

The development of speech foundation models (SFMs) like Whisper and SeamlessM4T has significantly advanced the field of speech processing. However, their closed nature--with inaccessible training data and code--poses major reproducibility and fair evaluation challenges. While other domains have made substantial progress toward open science by developing fully transparent models trained on open-source (OS) code and data, similar efforts in speech remain limited. To fill this gap, we introduce FAMA, the first family of open science SFMs for English and Italian, trained on 150k+ hours of OS speech data. Moreover, we present a new dataset containing 16k hours of cleaned and pseudo-labeled speech for both languages. Results show that FAMA achieves competitive performance compared to existing SFMs while being up to 8 times faster. All artifacts, including code, datasets, and models, are released under OS-compliant licenses, promoting openness in speech technology research.

View arXiv page View PDF Add to collection

Community

spapi

Paper submitter 4 days ago

🚀 New tech report out! Meet FAMA, a new open-science speech foundation model family for both Automatic Speech Recognition (ASR) and Speech Translation (ST) in 🇬🇧 English and 🇮🇹 Italian.

🔗 The models are live and ready to try on here on Huggingface