arxiv:2505.14534

Lessons from Defending Gemini Against Indirect Prompt Injections

Published on May 20

· Submitted by

iliashum on May 21

Upvote

Authors:

Chongyang Shi ,

Ilia Shumailov ,

Itay Yona ,

Juliette Pluto ,

Christopher A. Choquette-Choo ,

Chawin Sitawarin ,

Abstract

Google DeepMind evaluates the adversarial robustness of Gemini through continuous testing with adaptive attack techniques to enhance its resilience.

AI-generated summary

Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data. Some tools, however, require access to untrusted data introducing risk. Adversaries can embed malicious instructions in untrusted data which cause the model to deviate from the user's expectations and mishandle their data or permissions. In this report, we set out Google DeepMind's approach to evaluating the adversarial robustness of Gemini models and describe the main lessons learned from the process. We test how Gemini performs against a sophisticated adversary through an adversarial evaluation framework, which deploys a suite of adaptive attack techniques to run continuously against past, current, and future versions of Gemini. We describe how these ongoing evaluations directly help make Gemini more resilient against manipulation.

View arXiv page View PDF Add to collection

Community

iliashum

Paper author Paper submitter 10 days ago

Paper outlines how Google Deepmind security research team approaches evaluating robustness of Gemini to indirect prompt injections.

librarian-bot

9 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.14534 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.14534 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.14534 in a Space README.md to link it from this page.