VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank
Abstract
VisualQuality-R1, a reasoning-induced no-reference IQA model trained via reinforcement learning, outperforms discriminative models in visual quality assessment by generating human-aligned quality descriptions and supporting multi-dataset training.
DeepSeek-R1 has demonstrated remarkable effectiveness in incentivizing reasoning and generalization capabilities of large language models (LLMs) through reinforcement learning. Nevertheless, the potential of reasoning-induced computational modeling has not been thoroughly explored in the context of image quality assessment (IQA), a task critically dependent on visual reasoning. In this paper, we introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model, and we train it with reinforcement learning to rank, a learning algorithm tailored to the intrinsically relative nature of visual quality. Specifically, for a pair of images, we employ group relative policy optimization to generate multiple quality scores for each image. These estimates are then used to compute comparative probabilities of one image having higher quality than the other under the Thurstone model. Rewards for each quality estimate are defined using continuous fidelity measures rather than discretized binary labels. Extensive experiments show that the proposed VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models as well as a recent reasoning-induced quality regression method. Moreover, VisualQuality-R1 is capable of generating contextually rich, human-aligned quality descriptions, and supports multi-dataset training without requiring perceptual scale realignment. These features make VisualQuality-R1 especially well-suited for reliably measuring progress in a wide range of image processing tasks like super-resolution and image generation.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Q-Insight: Understanding Image Quality via Visual Reinforcement Learning (2025)
- Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards (2025)
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning (2025)
- GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning (2025)
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning (2025)
- VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model (2025)
- GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper