Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
Abstract
A benchmark using deceptive text samples to evaluate compositional vulnerabilities in multimodal representations is introduced, and a self-training approach improves zero-shot methods by enhancing attack success and sample diversity.
While pre-trained multimodal representations (e.g., CLIP) have shown impressive capabilities, they exhibit significant compositional vulnerabilities leading to counterintuitive judgments. We introduce Multimodal Adversarial Compositionality (MAC), a benchmark that leverages large language models (LLMs) to generate deceptive text samples to exploit these vulnerabilities across different modalities and evaluates them through both sample-wise attack success rate and group-wise entropy-based diversity. To improve zero-shot methods, we propose a self-training approach that leverages rejection-sampling fine-tuning with diversity-promoting filtering, which enhances both attack success rate and sample diversity. Using smaller language models like Llama-3.1-8B, our approach demonstrates superior performance in revealing compositional vulnerabilities across various multimodal representations, including images, videos, and audios.
Community
[ACL 2025 Main] We introduce (1) MAC, a benchmark for evaluating compositional vulnerabilities in pre-trained multimodal representations (e.g., CLIP, SigLIP, LLaVA, LanguageBind, CLAP) via deceptive text generation, and (2) a LLM-based diversity-promoting self-training approach that enhances attack success and diversity.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks (2025)
- Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models (2025)
- AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization (2025)
- Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection (2025)
- Adversarial Robustness for Unified Multi-Modal Encoders via Efficient Calibration (2025)
- Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models (2025)
- R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper