Papers
arxiv:2505.18773

Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models

Published on May 24
· Submitted by iliashum on May 27
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Scaling LiRA membership inference attacks to large pre-trained language models shows that while these attacks can succeed, their effectiveness is limited and does not definitively correlate with privacy metrics.

AI-generated summary

State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.

Community

Paper author Paper submitter

The paper investigates effectiveness of strong membership inference attacks (MIAs) on large language models (LLMs) by scaling the LiRA and RMIA attacks to GPT-2 models trained on massive datasets. The authors find that while strong MIAs can achieve success on pre-trained LLMs, their overall effectiveness is limited (e.g., AUC<0.7) in practical, realistic training settings.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.18773 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.18773 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.18773 in a Space README.md to link it from this page.

Collections including this paper 1