Papers
arxiv:2505.18882

Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach

Published on May 24
· Submitted by wick1d on May 29
Authors:
,
,
,
,

Abstract

Introducing personalized safety for LLMs through PENGUIN and RAISE frameworks Enhances safety scores by leveraging user-specific information without retraining models.

AI-generated summary

Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics - such as factuality, bias, or toxicity - overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce personalized safety to fill this gap and present PENGUIN - a benchmark comprising 14,000 scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to safety enhancement. To address this, we develop RAISE - a training-free, two-stage agent framework that strategically acquires user-specific background. RAISE improves safety scores by up to 31.6% over six vanilla LLMs, while maintaining a low interaction cost of just 2.7 user queries on average. Our findings highlight the importance of selective information gathering in safety-critical domains and offer a practical solution for personalizing LLM responses without model retraining. This work establishes a foundation for safety research that adapts to individual user contexts rather than assuming a universal harm standard.

Community

Paper author Paper submitter

A comprehensive and reliable benchmark for Personalized Safety in LLMs
✅ Each task has a carefully crafted automatic evaluation pipeline to ensure reliability
✅ The first Personalized Safety Benchmark
✅ Comprehensive: 7 scenarios, 14000 examples

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.18882 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.18882 in a Space README.md to link it from this page.

Collections including this paper 1