lijialudew
/

wav2vec_LittleBeats_LENA

Audio Classification

Fairseq

English

Model card Files Files and versions Community

lijialudew commited on Jan 20, 2023

Commit

cbe0420

1 Parent(s): fa2337c

Update README.md

Browse files

Files changed (1) hide show

README.md +66 -0

README.md CHANGED Viewed

@@ -1,3 +1,69 @@
 ---
 license: openrail
 ---

 ---
 license: openrail
+language:
+- en
+metrics:
+- f1
+library_name: fairseq
+pipeline_tag: audio-classification
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+We explore benefits of unsupervised pretraining of wav2vec 2.0 (W2V2) using large-scale unlabeled home recordings collected using LittleBeats and LENA (Language Environment Analysis) devices.
+LittleBeats (LB) is a new infant wearable multi-modal device that we developed, which simultaneously records audio, movement of the infant, as well as heart-rate variablity.
+We use W2V2 to advance LB audio pipeline such that it automatically provides reliable labels of speaker diarization and vocalization classifications for family members, including infants, parents, and siblings, at home.
+We show that W2V2 pretrained on thousands hours of large-scale unlabeled home audio outperforms oracle W2V2 pretrained on 52k-hours released by Facebook/Meta in terms of automatic family audio analysis tasks.
+# Model Details
+## Model Description
+<!-- Provide a longer summary of what this model is. -->
+Two versions of pretrained W2V2 models are available:
+- **LB1100/checkpoint_best.pt** pretrained using 1100-hour of LB home recordings collected from 110 families of children under 5-year-old
+- **LL4300/checkpoint_best.pt** pretrained using 1100-hour of LB home recordings collected from 110 families + 3200-hour of LENA home recordings from 275 families of children under 5-year-old
+## Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+For more information regarding this model, please checkout our paper
+- **Paper [optional]:** [More Information Needed]
+# Uses
+We develop fine-tuning recipe using SpeechBrain toolkit available at
+- **Repository:** https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/wav2vec_kic
+## Quick Start [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+If you wish to use fairseq framework, the following code snippet can be used to load the pretrained model
+[More Information Needed]
+# Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+We test 4 unlabeled datasets on unsupervised pretrained W2V2-base models:
+- **base (oracle version):** originally released version pretrained on ~52k-hour unlabeled audio
+- **Libri960h:** oracle version fine-tuned using 960h Librispeech
+- **LB1100h:** pretrain W2V2 using 1100h LB home recordings
+- **LL4300h:** pretrain W2V2 using 4300h LB+LENA home recordings
+We then fine-tune pretrained models on 11.7h of LB labeled home recordings, the f1 scores across three tasks are
+# Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+If you found this model helpful to you, please cite us as
+**BibTeX:**
+# Model Card Contact
+Jialu Li (she, her, hers)
+Ph.D candidate @ Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
+E-mail: [email protected]
+Homepage: https://sites.google.com/view/jialuli/