Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens Paper • 2503.08377 • Published Mar 11 • 2
Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens Paper • 2503.08377 • Published Mar 11 • 2
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 89
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25 • 72
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 88
Advancing Referring Expression Segmentation Beyond Single Image Paper • 2305.12452 • Published May 21, 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic Paper • 2306.15195 • Published Jun 27, 2023
Described Object Detection: Liberating Object Detection with Flexible Expressions Paper • 2307.12813 • Published Jul 24, 2023 • 1
Described Object Detection: Liberating Object Detection with Flexible Expressions Paper • 2307.12813 • Published Jul 24, 2023 • 1
Co-Salient Object Detection with Co-Representation Purification Paper • 2303.07670 • Published Mar 14, 2023
Running on Zero 326 326 MLLM-guided Image Editing (MGIE) 👩 Transform images based on textual instructions