ahans1 commited on
Commit
0a41636
·
verified ·
1 Parent(s): 0378e8a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -40,3 +40,16 @@ The following checkpoints are from our paper titled Goldfish Loss: Mitigating Me
40
  - The control model differs only in the fact that it did not utilize the canaries dataset for memorization and was simply pre-trained on 20B Redpajama tokens.
41
  - The Canaries dataset, which contains 2000 Wikidocs, is repeated 50 times throughout the pre-training. Thus, it contains around ~204M tokens in total (including padding).
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  - The control model differs only in the fact that it did not utilize the canaries dataset for memorization and was simply pre-trained on 20B Redpajama tokens.
41
  - The Canaries dataset, which contains 2000 Wikidocs, is repeated 50 times throughout the pre-training. Thus, it contains around ~204M tokens in total (including padding).
42
 
43
+ # Cite our work
44
+
45
+ If you find our work useful, please cite our paper:
46
+
47
+ ```bibtex
48
+ @misc{hans2024like,
49
+ title={Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs},
50
+ author={Abhimanyu Hans and Yuxin Wen and Neel Jain and John Kirchenbauer and Hamid Kazemi and Prajwal Singhania and Siddharth Singh and Gowthami Somepalli and Jonas Geiping and Abhinav Bhatele and Tom Goldstein},
51
+ year={2024},
52
+ eprint={2406.10209},
53
+ archivePrefix={arXiv},
54
+ }
55
+ ```