|
--- |
|
library_name: transformers |
|
tags: |
|
- goldfish-loss |
|
- memorization |
|
- mitigation |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
# Overview |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
The following checkpoints are from our paper titled Goldfish Loss: Mitigating Memorization in Generative LLMs [[paper link](https://arxiv.org/abs/2406.10209)]. |
|
|
|
| Checkpoint Name | k-GL | Token Drop Strategy | Pretrain Tokens | Primary Dataset | Canaries for Memorization<br>(repeated 50 times) | |
|
| ------------------------------------------------------------------------------------------------------------- | ---- | ------------------- | --------------- | --------------- | ----------------------------------------------------------------------------------- | |
|
| [tomg-group-umd/3-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/3-goldfish-loss-llama-1B) | 3 | Hash (width = 13) | 20B | Redpajama | [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) | |
|
| [tomg-group-umd/4-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/4-goldfish-loss-llama-1B) | 4 | Hash (width = 13) | 20B | Redpajama | [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) | |
|
| [tomg-group-umd/8-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/8-goldfish-loss-llama-1B) | 8 | Hash (width = 13) | 20B | Redpajama | [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) | |
|
| [tomg-group-umd/32-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/32-goldfish-loss-llama-1B) | 32 | Hash (width = 13) | 20B | Redpajama | [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) | |
|
| [tomg-group-umd/128-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/128-goldfish-loss-llama-1B) | 128 | Hash (width = 13) | 20B | Redpajama | [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) | |
|
| [tomg-group-umd/control-llama-1B](https://huggingface.co/tomg-group-umd/control-llama-1B) | \- | No Tokens Dropped | 20B | Redpajama | None | |
|
| [tomg-group-umd/standard-loss-llama-1B](https://huggingface.co/tomg-group-umd/standard-loss-llama-1B) | \- | No Tokens Dropped | 20B | Redpajama | [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) | |
|
|
|
- `standard-loss-llama-1B` and `control-llama-1B` are trained with standard causal language modelling loss with same exact specs as goldfish models. |
|
- Control model only differ in that it did NOT have canaries dataset used for memorized and simply pretrained on 20B Redpajama tokens. |
|
|
|
# Quick Links |
|
|
|
|
|
- **GitHub Repository**: https://github.com/ahans30/goldfish-loss |
|
- **arXiv**: https://arxiv.org/abs/2406.10209 |
|
|