Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models
Abstract
Scaling LiRA membership inference attacks to large pre-trained language models shows that while these attacks can succeed, their effectiveness is limited and does not definitively correlate with privacy metrics.
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.
Community
The paper investigates effectiveness of strong membership inference attacks (MIAs) on large language models (LLMs) by scaling the LiRA and RMIA attacks to GPT-2 models trained on massive datasets. The authors find that while strong MIAs can achieve success on pre-trained LLMs, their overall effectiveness is limited (e.g., AUC<0.7) in practical, realistic training settings.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Automatic Calibration for Membership Inference Attack on Large Language Models (2025)
- Fragments to Facts: Partial-Information Fragment Inference from LLMs (2025)
- A new membership inference attack that spots memorization in generative and predictive models: Loss-Based with Reference Model algorithm (LBRM) (2025)
- On Membership Inference Attacks in Knowledge Distillation (2025)
- Can Differentially Private Fine-tuning LLMs Protect Against Privacy Attacks? (2025)
- DynaNoise: Dynamic Probabilistic Noise Injection for Defending Against Membership Inference Attacks (2025)
- The DCR Delusion: Measuring the Privacy Risk of Synthetic Data (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper