papers - a fanqics Collection

fanqics 's Collections

papers

papers

updated 11 days ago

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22 • 14
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Paper • 2504.14032 • Published Apr 18 • 4
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 155
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 110
3D Scene Generation: A Survey

Paper • 2505.05474 • Published 24 days ago • 19
DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 75
MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection

Paper • 2504.06801 • Published Apr 9 • 5
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

Paper • 2504.07961 • Published Apr 10 • 6
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images

Paper • 2504.09621 • Published Apr 13 • 12
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation

Paper • 2504.13072 • Published Apr 17 • 13
DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging

Paper • 2504.12364 • Published Apr 16 • 21
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Paper • 2504.05303 • Published Apr 7 • 5
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation

Paper • 2504.07405 • Published Apr 10 • 12
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

Paper • 2504.08727 • Published Apr 11 • 11
MIEB: Massive Image Embedding Benchmark

Paper • 2504.10471 • Published Apr 14 • 17
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting

Paper • 2504.09048 • Published Apr 12 • 8
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published Apr 14 • 21
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17 • 17
Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published 16 days ago • 51
Constructing a 3D Town from a Single Image

Paper • 2505.15765 • Published 11 days ago • 23
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Paper • 2505.12448 • Published 14 days ago • 10