view article Article Data exploration and filtering with Nomic Atlas By visheratin • Mar 22, 2024 • 5
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 • 174
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 • 72
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20, 2024 • 81
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality Jun 24, 2024 • 34
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio Jul 10, 2024 • 24
view article Article How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o By chilijung • May 31, 2024 • 11
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien • May 23, 2024 • 16