Daniel van Strien's picture

Daniel van Strien PRO

davanstrien

·

https://danielvanstrien.xyz/

AI & ML interests

Machine Learning Librarian

Recent Activity

updated a dataset about 1 hour ago

data-is-better-together/fineweb-c-progress

updated a dataset about 2 hours ago

librarian-bots/model_cards_with_metadata

updated a dataset about 5 hours ago

davanstrien/dataset_cards_with_metadata

View all activity

Organizations

davanstrien's activity

upvoted a collection 1 day ago

Qwen3

23 items • Updated 1 day ago • 446

upvoted a paper 6 days ago

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Paper • 2502.10341 • Published Feb 14 • 2

upvoted a paper 8 days ago

Aioli: A Unified Optimization Framework for Language Model Data Mixing

Paper • 2411.05735 • Published Nov 8, 2024 • 1

upvoted 2 collections 12 days ago

Cell2Sentence Models

Cell2Sentence models trained for single-cell tasks • 5 items • Updated 14 days ago • 6

blt

4 items • Updated 12 days ago • 17

upvoted a paper 13 days ago

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published Mar 10 • 60

upvoted a collection 14 days ago

🏜️MIRAGE-Bench [NAACL'25]

Dataset Collection from the MIRAGE-Bench paper • 13 items • Updated 29 days ago • 2

upvoted a paper 14 days ago

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Paper • 2504.11456 • Published 15 days ago • 12

upvoted a collection 15 days ago

DataDecide

A suite of models, data, and evals over 25 corpora, 14 sizes, and 3 seeds to measure how accurately small experiments predict rankings at large scale. • 358 items • Updated 14 days ago • 13

upvoted a collection 16 days ago

Apriel

ServiceNow Language Modeling Lab's first model family series • 2 items • Updated 16 days ago • 7

upvoted 6 collections 19 days ago

RADIO

A collection of Foundation Vision Models that combine multiple models (CLIP, DINOv2, SAM, etc.). • 13 items • Updated 6 days ago • 17

kl3m

KL3M models and tokenizers • 13 items • Updated Feb 1 • 2

kl3m-data

25 items • Updated 19 days ago • 3

kl3m-index

KL3M Dataset Indices • 7 items • Updated Mar 26 • 1

KL3M Embeddings

7 items • Updated Mar 17 • 1

ALEA Mid- and Post-Train Resources

Various Q&A, abstractive/extractive summarization, classification, drafting, prediction, and conversational tasks • 9 items • Updated 20 days ago • 2