arxiv:2504.15047

RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search

Published on Apr 21

· Submitted by

quyanh on Apr 22

Upvote

Authors:

Quy-Anh Dang ,

Chris Ngo ,

Abstract

Large Language Models (LLMs) exhibit remarkable capabilities but are susceptible to adversarial prompts that exploit vulnerabilities to produce unsafe or biased outputs. Existing red-teaming methods often face scalability challenges, resource-intensive requirements, or limited diversity in attack strategies. We propose RainbowPlus, a novel red-teaming framework rooted in evolutionary computation, enhancing adversarial prompt generation through an adaptive quality-diversity (QD) search that extends classical evolutionary algorithms like MAP-Elites with innovations tailored for language models. By employing a multi-element archive to store diverse high-quality prompts and a comprehensive fitness function to evaluate multiple prompts concurrently, RainbowPlus overcomes the constraints of single-prompt archives and pairwise comparisons in prior QD methods like Rainbow Teaming. Experiments comparing RainbowPlus to QD methods across six benchmark datasets and four open-source LLMs demonstrate superior attack success rate (ASR) and diversity (Diverse-Score approx 0.84), generating up to 100 times more unique prompts (e.g., 10,418 vs. 100 for Ministral-8B-Instruct-2410). Against nine state-of-the-art methods on the HarmBench dataset with twelve LLMs (ten open-source, two closed-source), RainbowPlus achieves an average ASR of 81.1%, surpassing AutoDAN-Turbo by 3.9%, and is 9 times faster (1.45 vs. 13.50 hours). Our open-source implementation fosters further advancements in LLM safety, offering a scalable tool for vulnerability assessment. Code and resources are publicly available at https://github.com/knoveleng/rainbowplus, supporting reproducibility and future research in LLM red-teaming.

View arXiv page View PDF GitHub repository Add to collection

Community

quyanh

Paper author Paper submitter about 13 hours ago

Very happy to share our work to the community!

thucdangvan020999

about 12 hours ago

Nice work!
I observed that the number of mutations varies between the two experimental setups. Could you explain how to determine the optimal number of mutations?

quyanh

Paper author about 12 hours ago

Hi @thucdangvan020999 ,

Thanks for your good question!

Increasing the number of mutations can boost the attack success rate. However, to ensure a fair comparison with other methods and manage costs, we limited it to 10 in Experiment 2. For practical applications, we strongly suggest raising the number to achieve better outcomes.

Best regards

quyanh

Paper author Paper submitter about 4 hours ago

Code: https://github.com/knoveleng/rainbowplus

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.15047 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.15047 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.15047 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.