Large-Scale Data Selection for Instruction Tuning
Datasets and models associated with the paper "Large-Scale Data Selection for Instruction Tuning" (https://arxiv.org/abs/2503.01807)
- Paper • 2503.01807 • Published • 12
hamishivi/tulu-2-multitask-rrmax-326k-sft
Updated • 1Note Above is our Llama 2 7b model trained on the multitask mixture linked below on Tulu 2 data.
hamishivi/rds-sels-multitask-rrmax-top326k
Viewer • Updated • 326k • 82
hamishivi/llama-3.1-tulu-3-multitask-rrmax-939k-sft
Updated • 2Note Above is our Llama 3 8b model trained on the multitask mixture linked below on Tulu 3 data.
hamishivi/rds-sels-tulu-3-multitask-rrmax-939k
Viewer • Updated • 939k • 102Note Below is our unfiltered datasets, multi-million size instruction tuning datasets made up of all the data considered for Tulu 2 and 3 respectively.
hamishivi/tulu-2-unfiltered
Viewer • Updated • 3.54M • 342 • 1hamishivi/200k-tulu-2-unbalanced
Viewer • Updated • 200k • 29
hamishivi/tulu-3-unfiltered
Viewer • Updated • 4.88M • 178 • 1Note Below are other multitask trained models and the data they were trained on. Tulu 2 models are based on Llama 2 7b and Tulu 3 models on Llama 3 8b.
hamishivi/llama-3.1-tulu-3-arena-hard-939k-sft
Updated • 11hamishivi/rds-sels-tulu-3-arena-hard-939k
Viewer • Updated • 939k • 70hamishivi/tulu-2-arena-hard-326k-sft
Updated • 2hamishivi/rds-sels-arena-hard-top326k
Viewer • Updated • 326k • 79hamishivi/tulu-2-wildchat-326k-sft
Updated • 1
hamishivi/rds-sels-wildchat-top326k
Viewer • Updated • 326k • 42Note Below is the data selected by RDS+ with Llama 2 7b from the Tulu 2 unfiltered dataset, selecting for the evaluation in the dataset name.
hamishivi/rds-sels-alpacafarm-top326k
Viewer • Updated • 326k • 31hamishivi/rds-sels-gsm8k-shots-top326k
Viewer • Updated • 326k • 31hamishivi/rds-sels-codex-top326k
Viewer • Updated • 326k • 37hamishivi/rds-sels-bbh-shots-top326k
Viewer • Updated • 326k • 39hamishivi/rds-sels-mmlu-shots-top326k
Viewer • Updated • 326k • 51hamishivi/rds-sels-squad-top326k
Viewer • Updated • 326k • 59hamishivi/rds-sels-tydiqa-shots-top326k
Viewer • Updated • 326k • 32hamishivi/lsds_data
Preview • Updated • 83