Papers
arxiv:2505.16483

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

Published on May 22
· Submitted by ssz1111 on May 26
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

CANOE improves LLM faithfulness in generation tasks using synthetic QA data and Dual-GRPO reinforcement learning without human annotations.

AI-generated summary

Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to improve the faithfulness of LLMs in both short-form and long-form generation tasks without human annotations. Specifically, we first synthesize short-form question-answering (QA) data with four diverse tasks to construct high-quality and easily verifiable training data without human annotation. Also, we propose Dual-GRPO, a rule-based reinforcement learning method that includes three tailored rule-based rewards derived from synthesized short-form QA data, while simultaneously optimizing both short-form and long-form response generation. Notably, Dual-GRPO eliminates the need to manually label preference data to train reward models and avoids over-optimizing short-form generation when relying only on the synthesized short-form QA data. Experimental results show that CANOE greatly improves the faithfulness of LLMs across 11 different downstream tasks, even outperforming the most advanced LLMs, e.g., GPT-4o and OpenAI o1.

Community

Paper author Paper submitter

The code, data, and models are available at: https://github.com/S1s-Z/CANOE.

Paper author Paper submitter
edited 9 days ago

With only 7B parameters, CANOE already exceeds state-of-the-art LLMs like GPT-4o and OpenAI o1.

WX20250526-101024@2x.png

Paper author Paper submitter

CANOE first synthesizes easily verifiable short-form QA data and then proposes the Dual-GRPO with designed rule-based rewards to improve the faithfulness of LLMs.
WX20250526-100403@2x.png

Paper author Paper submitter

Experimental results (%) on eleven datasets. Please find more details in our paper!

WX20250526-102915@2x.png

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.16483 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.16483 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.16483 in a Space README.md to link it from this page.

Collections including this paper 1