view article Article Explore, Curate and Vector Search Any Hugging Face Dataset with Nomic Atlas By MaxNomic and 4 others • Jan 23 • 30
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien and 5 others • Dec 23, 2024 • 19
view article Article Open Preference Dataset for Text-to-Image Generation by the 🤗 Community Dec 9, 2024 • 55
view article Article Let’s make a generation of amazing image generation models By burtenshaw and 4 others • Nov 26, 2024 • 33
view article Article Introducing Synthetic Data Workshop: Your Gateway to Easy Synthetic Dataset Creation By davanstrien • Jun 20, 2024 • 12
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien • May 23, 2024 • 16
view article Article Synthetic dataset generation techniques: Self-Instruct By davanstrien • May 15, 2024 • 14
view article Article Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia? By davanstrien • May 7, 2024 • 8
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20, 2024 • 81
view article Article Extracting Insights from Model Cards Using Open Large Language Models By davanstrien • Nov 27, 2023
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 31
view article Article Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub Aug 2, 2023 • 1
view article Article The Hugging Face Hub for Galleries, Libraries, Archives and Museums Jun 12, 2023 • 1