tomg-group-umd
/

3-goldfish-loss-llama-1B

Text2Text Generation

text-generation

text-generation-inference

Model card Files Files and versions Community

3-goldfish-loss-llama-1B / README.md

ahans1's picture

Update README.md

a8697c7 verified 8 months ago

|

3.19 kB

	---
	library_name: transformers
	tags:
	- goldfish-loss
	- memorization
	- mitigation
	license: apache-2.0
	language:
	- en
	pipeline_tag: text2text-generation
	---

	# Overview

	<!-- Provide a quick summary of what the model is/does. -->

	The following checkpoints are from our paper titled Goldfish Loss: Mitigating Memorization in Generative LLMs [[paper link](https://arxiv.org/abs/2406.10209)].

	\| Checkpoint Name \| k-GL \| Token Drop Strategy \| Pretrain Tokens \| Primary Dataset \| Canaries for Memorization<br>(repeated 50 times) \|
	\| ------------------------------------------------------------------------------------------------------------- \| ---- \| ------------------- \| --------------- \| --------------- \| ----------------------------------------------------------------------------------- \|
	\| [tomg-group-umd/3-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/3-goldfish-loss-llama-1B) \| 3 \| Hash (width = 13) \| 20B \| Redpajama \| [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) \|
	\| [tomg-group-umd/4-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/4-goldfish-loss-llama-1B) \| 4 \| Hash (width = 13) \| 20B \| Redpajama \| [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) \|
	\| [tomg-group-umd/8-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/8-goldfish-loss-llama-1B) \| 8 \| Hash (width = 13) \| 20B \| Redpajama \| [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) \|
	\| [tomg-group-umd/32-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/32-goldfish-loss-llama-1B) \| 32 \| Hash (width = 13) \| 20B \| Redpajama \| [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) \|
	\| [tomg-group-umd/128-goldfish-loss-llama-1B](https://huggingface.co/tomg-group-umd/128-goldfish-loss-llama-1B) \| 128 \| Hash (width = 13) \| 20B \| Redpajama \| [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) \|
	\| [tomg-group-umd/control-llama-1B](https://huggingface.co/tomg-group-umd/control-llama-1B) \| \- \| No Tokens Dropped \| 20B \| Redpajama \| None \|
	\| [tomg-group-umd/standard-loss-llama-1B](https://huggingface.co/tomg-group-umd/standard-loss-llama-1B) \| \- \| No Tokens Dropped \| 20B \| Redpajama \| [Wikipedia](https://huggingface.co/datasets/tomg-group-umd/wikipedia-en-2k-samples) \|

	- `standard-loss-llama-1B` and `control-llama-1B` are trained with standard causal language modelling loss with same exact specs as goldfish models.
	- Control model only differ in that it did NOT have canaries dataset used for memorized and simply pretrained on 20B Redpajama tokens.

	# Quick Links


	- GitHub Repository: https://github.com/ahans30/goldfish-loss
	- arXiv: https://arxiv.org/abs/2406.10209