Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles
Abstract
A novel feed-forward model achieves fast 3D stylization using sparse view images, maintaining multi-view consistency and high-quality style transfer while retaining reconstruction accuracy.
Stylizing 3D scenes instantly while maintaining multi-view consistency and faithfully resembling a style image remains a significant challenge. Current state-of-the-art 3D stylization methods typically involve computationally intensive test-time optimization to transfer artistic features into a pretrained 3D representation, often requiring dense posed input images. In contrast, leveraging recent advances in feed-forward reconstruction models, we demonstrate a novel approach to achieve direct 3D stylization in less than a second using unposed sparse-view scene images and an arbitrary style image. To address the inherent decoupling between reconstruction and stylization, we introduce a branched architecture that separates structure modeling and appearance shading, effectively preventing stylistic transfer from distorting the underlying 3D scene structure. Furthermore, we adapt an identity loss to facilitate pre-training our stylization model through the novel view synthesis task. This strategy also allows our model to retain its original reconstruction capabilities while being fine-tuned for stylization. Comprehensive evaluations, using both in-domain and out-of-domain datasets, demonstrate that our approach produces high-quality stylized 3D content that achieve a superior blend of style and scene appearance, while also outperforming existing methods in terms of multi-view consistency and efficiency.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- 3D Stylization via Large Reconstruction Model (2025)
- MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models (2025)
- SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting (2025)
- StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians (2025)
- Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos (2025)
- Sparfels: Fast Reconstruction from Sparse Unposed Imagery (2025)
- Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper