arxiv:2505.18600

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Published on May 24

· Submitted by

bryanswkim on May 29

Upvote

Authors:

Bryan Sangwoo Kim ,

Abstract

Chain-of-Zoom (CoZ) enhances single-image super-resolution models by using an autoregressive chain of intermediate scale-states and multi-scale-aware prompts to achieve extreme magnifications with high quality.

AI-generated summary

Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments show that a standard 4x diffusion SR model wrapped in CoZ attains beyond 256x enlargement with high perceptual quality and fidelity. Project Page: https://bryanswkim.github.io/chain-of-zoom/ .

View arXiv page View PDF Project page GitHub repository Add to collection

Community

bryanswkim

Paper author Paper submitter 4 days ago

We introduce Chain-of-Zoom, a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts, to explore extreme resolutions. Project page: https://bryanswkim.github.io/chain-of-zoom/