A curated collection of machine translation datasets
Pietro Lesci
pietrolesci
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Recent Activity
updated
a dataset
5 minutes ago
pietrolesci/finewebedu-20B
updated
a dataset
about 1 hour ago
pietrolesci/finewebedu-20B
updated
a dataset
about 13 hours ago
pietrolesci/finewebedu-20B
Organizations
spaces
1
models
19

pietrolesci/me850M_minipile_bpe32000minipile
Updated
•
54

pietrolesci/me340M-tied_minipile_bpe32000minipile
Updated
•
56

pietrolesci/me57M-tied_minipile_bpe2wp32000minipile
Updated

pietrolesci/me57M-tied_minipile_bpe128000minipile
Updated

pietrolesci/me57M-tied_minipile_wordpiece32000minipile
Updated

pietrolesci/me57M-tied_minipile_bpe8064minipile
Updated

pietrolesci/me57M-tied_minipile_bpe32000minipile
Updated

pietrolesci/tokenisers
Updated

pietrolesci/bert-civilcomments-gradtracking
Updated

pietrolesci/roberta-base_mnli_b9799b8f9b
Updated
datasets
53
pietrolesci/finewebedu-20B
Viewer
•
Updated
•
40.4M
•
121
pietrolesci/pile-deduped-pythia-preshuffled
Viewer
•
Updated
•
97.6M
•
323
pietrolesci/me-minipile-evals
Viewer
•
Updated
•
1.82M
•
49
pietrolesci/minipile
Viewer
•
Updated
•
6.06M
•
544
pietrolesci/opus-5langs-1M
Viewer
•
Updated
•
5M
•
122
pietrolesci/opus-raw
Viewer
•
Updated
•
4.06B
•
2.73k
pietrolesci/pythia-pile-stats
Viewer
•
Updated
•
113M
•
176
pietrolesci/slim-pajama-eval
Viewer
•
Updated
•
1.84M
•
62
•
1
pietrolesci/pile-subset
Updated
•
29
pietrolesci/cmnist
Viewer
•
Updated
•
308k
•
87