The Distracting Effect: Understanding Irrelevant Passages in RAG
Abstract
Methods for identifying and utilizing hard distracting passages in Retrieval Augmented Generation (RAG) systems improve answer-generating LLMs by up to 7.5% in accuracy.
A well-known issue with Retrieval Augmented Generation (RAG) is that retrieved passages that are irrelevant to the query sometimes distract the answer-generating LLM, causing it to provide an incorrect response. In this paper, we shed light on this core issue and formulate the distracting effect of a passage w.r.t. a query (and an LLM). We provide a quantifiable measure of the distracting effect of a passage and demonstrate its robustness across LLMs. Our research introduces novel methods for identifying and using hard distracting passages to improve RAG systems. By fine-tuning LLMs with these carefully selected distracting passages, we achieve up to a 7.5% increase in answering accuracy compared to counterparts fine-tuned on conventional RAG datasets. Our contribution is two-fold: first, we move beyond the simple binary classification of irrelevant passages as either completely unrelated vs. distracting, and second, we develop and analyze multiple methods for finding hard distracting passages. To our knowledge, no other research has provided such a comprehensive framework for identifying and utilizing hard distracting passages.
Community
The paper defines methods to obtain distracting passages for Retrieval Augmented Generation, quantify their distracting effects, and use them to create more robust LLMs.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability (2025)
- Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task (2025)
- Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation (2025)
- Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG (2025)
- On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation (2025)
- Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration (2025)
- Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper